diff --git a/HRInfo.md b/HRInfo.md new file mode 100644 index 0000000..6763dd9 --- /dev/null +++ b/HRInfo.md @@ -0,0 +1,469 @@ +# 2020年春招各大公司招聘内推信息(持续更新-3月17日) + +**update:2020年4月9日21:37:03** + +注意:信息来源参考牛客,这里只是简单汇总,请自行甄别。 +大概是按照招聘发布时间顺序汇总,有些招聘有截止日期请自行注意。 + +### 更新不易,点赞支持!收藏可以跟踪最新招聘动态! +### 欢迎关注公众号:TALKDATA 获取更多面试技巧! +### B站搜索:TALKDATA 有大量面经视频 + +---------- + +## 各大互联网公司招聘官网 +https://zhuanlan.zhihu.com/p/97099493 + +## 下面包括春招、补录、21届实习 + +### 【B站内推】哔哩哔哩2020春季校园招聘正式启动! 2020-4-9 +https://www.nowcoder.com/discuss/371425?type=7&order=0&pos=81&page=7 + +### 【趋势科技】2021届暑期提前批Offer+计划内推码 9246 2020-4-9 +https://www.nowcoder.com/discuss/383040?type=7&order=0&pos=79&page=4 + +### 【远景2021春招实习】秒推!HC充足!需要推一把上岸的来 2020-4-9 +https://www.nowcoder.com/discuss/371672?type=7&order=0&pos=54&page=15 + +### 【拼多多】2020暑期实习内推,快上车!!! 2020-4-9 +https://www.nowcoder.com/discuss/393350?type=7&order=0&pos=48&page=11 + +### 【网易互娱】21届实习内推超!简!单!(4.13开放) 2020-4-9 +https://www.nowcoder.com/discuss/403853?type=7&order=0&pos=23&page=1 + +### 【图森未来】2020春招 【免笔试/全职/实习】 2020-4-9 +https://www.nowcoder.com/discuss/380005?type=7&order=0&pos=20&page=12 + +### 【CVTE视源股份】2021长期实习生招聘简章【实习生招聘】 2020-4-9 +https://www.nowcoder.com/discuss/368650?type=7&order=0&pos=17&page=3 + +### 【快手】校招内推,快手实习内推,快手春招补录内推 2020-4-9 +https://www.nowcoder.com/discuss/398991?type=7&order=0&pos=13&page=2 + +### 【阿里巴巴】淘宝消息中台-Java软件工程师 2020-4-9 +https://www.nowcoder.com/discuss/403026?type=7&order=3&pos=187&page=0 + +### 【Shopee】2020年Shopee实习正式启动 2020-4-9 +https://www.nowcoder.com/discuss/403062?type=7&order=3&pos=178&page=0 + +### 【阿里云】Java开发工程师【可转正】 2020-4-9 +https://www.nowcoder.com/discuss/403622?type=7&order=3&pos=95&page=0 + +### 【百度】部门直推/公司内推,各种岗位【社招】 2020-4-9 +https://www.nowcoder.com/discuss/403663?type=7&order=3&pos=90&page=0 + +### 【阿里巴巴】2021届实习生核心又好进的部门内推啦! 2020-4-9 +https://www.nowcoder.com/discuss/403666?type=7&order=3&pos=88&page=1 + +### 【度小满】金融2020暑期实习生招聘内推 2020-4-9 +https://www.nowcoder.com/discuss/403683?type=7&order=3&pos=82&page=1 + +### 【阿里云】高可用架构团队实习招聘,100%转正,HC20+ 2020-4-9 +https://www.nowcoder.com/discuss/403702?type=7&order=3&pos=80&page=1 + +### 【京东】2020校招/社招也可内推!内附内推码 2020-4-9 +https://www.nowcoder.com/discuss/403847?type=7&order=3&pos=62&page=1 + +### 【阿里巴巴】大进口产品技术部门Java实习生2020-4-9 +https://www.nowcoder.com/discuss/404034?type=7&order=3&pos=38&page=0 + +### 【百度】基础架构部 招后端研发工程师,有意向简历砸过来!!! 2020-4-9 +https://www.nowcoder.com/discuss/404044?type=7&order=3&pos=35&page=0 + +### 【阿里巴巴企业智能】春招实习末班车啦,需求大,可转正 2020-4-9 +https://www.nowcoder.com/discuss/404051?type=7&order=3&pos=30&page=0 + +### 【科大讯飞】春招内推火热进行中!!! 2020-3-17 +https://www.nowcoder.com/discuss/384336?type=0&order=undefined&pos=96&page=1 + +### 【度小满】暑期实习内推 2020-3-17 +https://www.nowcoder.com/discuss/384339?type=0&order=undefined&pos=92&page=1 + +### 【陌陌】暑期实习生招聘已经开启 2020-3-17 +https://www.nowcoder.com/discuss/374285?type=0&order=undefined&pos=93&page=1 + +### 【招商银行信用卡中心】21/20届免筛选直通面试 2020-3-17 +https://www.nowcoder.com/discuss/384342?type=0&order=undefined&pos=84&page=0 + +### 【趣头条】 | 技术实习岗 汇总| 还没上车?就差你了 2020-3-17 +https://www.nowcoder.com/discuss/369159?type=0&order=undefined&pos=77&page=1 + +### 【百度】Java研发工程师(春季实习可转正) 2020-3-17 +https://www.nowcoder.com/discuss/375725?type=0&order=undefined&pos=63&page=1 + +### 【阿里云】视觉智能开放平台 2020-3-17 +https://www.nowcoder.com/discuss/384348?type=7&order=3&pos=30&page=1 + +### 【阿里云】高可用架构团队暑期实习生招聘 2020-3-17 +https://www.nowcoder.com/discuss/384377?type=7&order=3&pos=23&page=0 + +### 【蚂蚁金服】【21届实习】简历直达主管投过其他组的还可以投! 2020-3-17 +https://www.nowcoder.com/discuss/384378?type=7&order=3&pos=22&page=1 + +### 【拼多多】2020届校园招聘 拼越计划-技术精英专场 2020-3-17 +https://www.nowcoder.com/discuss/384387?type=7&order=3&pos=21&page=1 + +### 【商汤科技】【HR直推】实习岗位大发送 2020-3-7 +https://www.nowcoder.com/discuss/376615?type=7&order=3&pos=164&page=0 + +### 【腾讯微信事业群WXG暑期实习生内推】可直推、可查进度 2020-3-7 +https://www.nowcoder.com/discuss/376641?type=7&order=3&pos=155&page=1 + +### 【蘑菇街】内推啦(21实习,20补招限技术岗) 2020-3-7 +https://www.nowcoder.com/discuss/376717?type=7&order=3&pos=136&page=0 + +### 【4399】春招下周开始笔试!内推码v5okf速度上车 2020-3-7 +https://www.nowcoder.com/discuss/376755?type=7&order=3&pos=128&page=1 + +### 【蚂蚁金服】-数据平台部-招 Java /算法工程师等 2020-3-7 +https://www.nowcoder.com/discuss/150772 + +### 【深信服内推】深信服春招开始啦!有求必应 2020-3-7 +https://www.nowcoder.com/discuss/376768?type=7&order=3&pos=124&page=1 + +### 【美团点评】春招开始了,正文有内推二维码 2020-3-7 +https://www.nowcoder.com/discuss/376851?type=7&order=3&pos=97&page=0 + +### 【网易】春招21实习+20补招内推,可查进度 2020-3-7 +https://www.nowcoder.com/discuss/376934?type=7&order=3&pos=69&page=0 + +### 【支付宝】金融核心实习内推----周末不编程师兄等你撩 2020-3-7 +https://www.nowcoder.com/discuss/376935?type=7&order=3&pos=68&page=1 + +### 【阿里巴巴】-淘宝直播团队春招(与实习生)20届也可 2020-3-7 +https://www.nowcoder.com/discuss/376985?type=7&order=3&pos=48&page=1 + +### 【阿里巴巴】数据技术及产品部2021实习+校招开始啦 2020-3-7 +https://www.nowcoder.com/discuss/377062?type=7&order=3&pos=17&page=1 + +### 【猿辅导春招内推】35w校招offer等你来 2020-2-27 +https://www.nowcoder.com/discuss/370740?type=7&order=3&pos=479&page=2 + +### 【哔哩哔哩】/b站/bilibili 2020届春招/校招内推 2020-2-27 +https://www.nowcoder.com/discuss/370914?type=7&order=3&pos=406&page=7 + +### 【SmartX】 专注于基础架构领域实习 2020-2-27 +https://www.nowcoder.com/discuss/371031?type=7&order=3&pos=358&page=1 + +### 【斗鱼直播】20春季补录和21届实习招聘正式启动!!! 2020-2-27 +https://www.nowcoder.com/discuss/371494?type=7&order=3&pos=168&page=1 + +### 【阿里巴巴天猫国际】2021年毕业实习生提前看过来了 2020-2-27 +https://www.nowcoder.com/discuss/371648?type=7&order=3&pos=108&page=0 + +### 【远景】2021实习内推 秒推!来不及解释了,快上车! 2020-2-27 +https://www.nowcoder.com/discuss/371672?type=7&order=3&pos=95&page=1 + +### 【携程内推】内推2020春招 2020-2-27 +https://www.nowcoder.com/discuss/371800?type=7&order=3&pos=63&page=1 + +### 【美团点评】基础研发平台暑期实习生组内直招(岗位:测试开发、开发 2020-2-27 +https://www.nowcoder.com/discuss/371810?type=7&order=3&pos=60&page=0 + +### 【网易互娱】20春招补招/21暑期实习来了 2020-2-27 +https://www.nowcoder.com/discuss/371825?type=7&order=3&pos=57&page=1 + +### 【科大讯飞】春招内推 2020-2-27 +https://www.nowcoder.com/discuss/371832?type=7&order=3&pos=53&page=0 + +### 【蚂蚁金服oceanbase】团队21届实习招聘 2020-2-27 +https://www.nowcoder.com/discuss/371861?type=7&order=3&pos=47&page=1 + +### 【蚂蚁金服】国际事业群实习生及校园招聘 2020-2-27 +https://www.nowcoder.com/discuss/371906?type=7&order=3&pos=37&page=1 + +### 【SHEIN】 跨境电商独角兽 2020春招内推 帮查进度 2020-2-27 +https://www.nowcoder.com/discuss/371969?type=7&order=3&pos=14&page=0 + +### 【蚂蚁金服】2021届实习-免笔试-可转正-Java开发 2020-2-27 +https://www.nowcoder.com/discuss/371979?type=7&order=3&pos=10&page=1 + +### 【阿里巴巴新零售供应链平台事业部】校招+社招 2020-2-27 +https://www.nowcoder.com/discuss/371991?type=7&order=3&pos=7&page=1 + +### 【蚂蚁金服Oceanbase团队】实习生招聘开始啦!!! 2020-2-23 +https://www.nowcoder.com/discuss/369212?type=7&order=3&pos=164&page=1 + +### 【美团点评】搜索后台实习生 2020-2-23 +https://www.nowcoder.com/discuss/369264?type=7&order=3&pos=155&page=1 + +### 【华为2012实验室中央软件院】春招&实习开始啦 2020-2-23 +https://www.nowcoder.com/discuss/369269?type=7&order=3&pos=153&page=1 + +### 【一加】各种岗位、内推2020届 2020-2-23 +https://www.nowcoder.com/discuss/369280?type=7&order=3&pos=150&page=2 + +### 【浪潮集团】内推 春招补招,浪潮集团2020届校园招聘 2020-2-23 +https://www.nowcoder.com/discuss/369553?type=7&order=3&pos=95&page=1 + +### 【图森未来科技的公司】实习招聘 2020-2-23 +https://www.nowcoder.com/discuss/369654?type=7&order=3&pos=69&page=0 + +### 【微软苏州】O365 Tech Talk & 社招内推 2020-2-23 +https://www.nowcoder.com/discuss/369868?type=7&order=3&pos=23&page=1 + +### 【菜鸟物流国际版】春季实习校招开始了,还没上车的小伙伴抓紧哈 2020-2-23 +https://www.nowcoder.com/discuss/369902?type=7&order=3&pos=20&page=0 + +### 【阿里钉钉】远航者计划-2021届技术实习生招聘内部推荐开始啦! 2020-2-23 +https://www.nowcoder.com/discuss/369920?type=7&order=3&pos=15&page=1 + +### 【网易雷火】实习生&补招春招 已经启动,只需几步即可内推 2020-2-23 +https://www.nowcoder.com/discuss/369936?type=7&order=3&pos=9&page=1 + +### 【阿里巴巴】大数据开发实习生 2020-02-16 +https://www.nowcoder.com/discuss/366504?type=7&order=3&pos=286&page=1 + +### 【亚信科技】2021届实习生招聘正式启动! 2020-02-16 +https://www.nowcoder.com/discuss/366630?type=7&order=3&pos=242&page=1 + +### 【阿里巴巴-阿里妈妈】春招实习内推 2020-02-16 +https://www.nowcoder.com/discuss/366714?type=7&order=3&pos=199&page=0 + +### 【神州信息】2020春季招聘 2020-02-16 +https://www.nowcoder.com/discuss/366810?type=7&order=3&pos=162&page=1 + +### 【阿里钉钉】—2021届技术实习生招聘内推开始啦 2020-02-16 +https://www.nowcoder.com/discuss/366988?type=7&order=3&pos=103&page=1 + +### 【思科】2020校招补录软件开发工程师,base上海 2020-02-16 +https://www.nowcoder.com/discuss/367074?type=7&order=3&pos=88&page=1 + +### 【虎牙直播】 2021实习/2020校招/社招 2020-02-16 +https://www.nowcoder.com/discuss/367086?type=7&order=3&pos=83&page=3 + +### 【商汤科技】-JAVA开发 2020届补招 2020-02-16 +https://www.nowcoder.com/discuss/367154?type=7&order=3&pos=60&page=1 + +### 【亿联网络】20届春招【入贴即可内推】 2020-02-16 +https://www.nowcoder.com/discuss/366781?type=7&order=0&pos=42&page=2 + +### 【TPLINK】2020春招内推 2020-02-16 +https://www.nowcoder.com/discuss/366976?type=7&order=0&pos=29&page=1 + +### 【搜狗】2020春招正式批内推 2020-02-16 +https://www.nowcoder.com/discuss/366789?type=7&order=0&pos=26&page=1 + +### 【松果出行】校招~~待遇好福利好,欢迎投递 2020-02-16 +https://www.nowcoder.com/discuss/367216?type=7&order=0&pos=18&page=1 + +### 【微信公众号后台团队】-2020暑期实习生招聘【可转正】 2020-02-16 +https://www.nowcoder.com/discuss/367353?type=7&order=0&pos=15&page=1 + +### 【帆软春招】纯线上笔面试,base南京无锡可选,可内推 2020-02-16 +https://www.nowcoder.com/discuss/366846?type=7&order=0&pos=13&page=2 + +### 【心动网络&TapTap】 2020春季校园招聘开始啦! 2020-02-16 +https://www.nowcoder.com/discuss/367324?type=7&order=0&pos=12&page=2 + +### 【Shopee】 (base 新加坡) 全年内推!!应届往届均可 2020-02-16 +https://www.nowcoder.com/discuss/367382?type=7&order=0&pos=10&page=0 + +### 【依图科技】内推 2020-02-16 +https://www.nowcoder.com/discuss/364637?type=7&order=0&pos=8&page=2 + +### 【微信】小程序技术团队招聘暑期实习 2020-02-12 +https://www.nowcoder.com/discuss/366099?type=7&order=3&pos=114&page=1 + +### 【南京青书】春招提前批!待遇优厚,还有非技术岗~在家也能面试哦 2020-02-12 +https://www.nowcoder.com/discuss/366310?type=7&order=3&pos=37&page=1 + +### 【科大讯飞】科大讯飞2020届校园招聘春季补录内推来啦! 2020-02-12 +https://www.nowcoder.com/discuss/366335?type=7&order=3&pos=24&page=0 + +### 【奇安信】2020春招&实习内推 2020-02-12 +https://www.nowcoder.com/discuss/366347?type=7&order=3&pos=17&page=1 + +### 【Deeproute.ai】自动驾驶企业: 内推 2020-02-07 +https://www.nowcoder.com/discuss/364902?type=7&order=3&pos=86&page=1 + +### 【北京陌陌】实习招聘 技术岗 产品岗 2020-02-07 +https://www.nowcoder.com/discuss/364862?type=7&order=0&pos=151&page=1 + +### 【阿里云计算有限公司】阿里云-弹性计算-研发工程师JAVA 2020-02-07 +https://www.nowcoder.com/discuss/365054?type=7&order=0&pos=134&page=1 + +### 【阿里妈妈】淘宝联盟 21届暑期实习生 开始啦 2020-02-07 +https://www.nowcoder.com/discuss/364820?type=7&order=0&pos=107&page=1 + +### 【阿里云】智能存储/高性能计算2020实习生招聘吧! 2020-02-07 +https://www.nowcoder.com/discuss/365019?type=7&order=0&pos=93&page=2 + +### 【TCL】2020届校招启动,400余岗位等你来! 2020-02-07 +https://www.nowcoder.com/discuss/365063?type=7&order=0&pos=73&page=1 + +### 【淘系】 2021届春招内推 新零售技术事业群-淘系技术部 技术&非技术 2020-02-07 +https://www.nowcoder.com/discuss/365108?type=7&order=0&pos=55&page=1 + +### 【三七互娱】校招和社招岗位 2020-02-07 +https://www.nowcoder.com/discuss/365171?type=7&order=0&pos=34&page=1 + +### 后端开发实习岗-Java-自动驾驶基础架构部 上海 2020-02-07 +https://www.nowcoder.com/discuss/358568?type=7&order=0&pos=17&page=1 + +### 【微软苏州】新产品研发团队招人啦! 2020-02-07 +https://www.nowcoder.com/discuss/365059?type=7&order=0&pos=15&page=1 + +### 【猿辅导】社招or20届 可内推各种工程师 2020-02-07 +https://www.nowcoder.com/discuss/365035?type=7&order=0&pos=13&page=1 + +### 【京东】京东零售-技术与数据中台实习生招聘 2020-02-07 +https://www.nowcoder.com/discuss/364913?type=0&order=0&pos=14&page=1 + +### 【博思软件】春招线上通道正常开启,前端/JAVA/ 大数据/算法/软件测试均有岗位需求哦! 2020-02-04 +https://www.nowcoder.com/discuss/364394?type=7&order=0&pos=91&page=1 + +### 【阿里巴巴电商板块新兴的核心业务】2021校招启动 2020-02-04 +https://www.nowcoder.com/discuss/364592?type=0&order=0&pos=100&page=0 + +### 【阿里巴巴供应链平台事业部】2021届暑期实习生招聘(春招) 2020-02-04 +https://www.nowcoder.com/discuss/364611?type=0&order=0&pos=48&page=0 + +### 【OPPO内推】2020届应届生 软件类,硬件类,产品类,综合职能类,营销类等 2020-02-04 +https://www.nowcoder.com/discuss/364442?type=0&order=0&pos=9&page=3 + +### 【Grab内推-国际互联网大厂】社招&校招都有!不加班! 2020-02-04 +https://www.nowcoder.com/discuss/337566?type=7&order=0&pos=68&page=2 + +### 【字节跳动】 校招/社招/实习 数据平台组部门直招 +https://www.nowcoder.com/discuss/363916?type=7&order=3&pos=30&page=1 + +### 【阿里巴巴】CBU无线提前春招java/android/iOS +https://www.nowcoder.com/discuss/364007?type=7&order=3&pos=25&page=1 + +### 【阿里巴巴】急招前端工程师(实习、校招、社招均可) +https://www.nowcoder.com/discuss/364086?type=7&order=3&pos=18&page=1 + +### 【图森未来】前后端、算法岗,内推实习 +https://www.nowcoder.com/discuss/364105?type=7&order=3&pos=14&page=1 + +### 【Jerry Ai】 软件工程师招聘, 含实习岗(远程/多伦多) +https://www.nowcoder.com/discuss/364158?type=7&order=3&pos=10&page=0 + +### 【58同城】部门直推-校招+社招-算法和后端 +https://www.nowcoder.com/discuss/362453?type=0&order=0&pos=140&page=1 + +### 【京东】招聘实习生可转正-前端/后端开发工程师 +https://www.nowcoder.com/discuss/363081?type=0&order=0&pos=104&page=1 + +### 【网易游戏】2020年春招内推(互娱,雷火事业群)大量HC +https://www.nowcoder.com/discuss/361032?type=0&order=0&pos=64&page=1 + +### 【头条实习内推】自助查进度 大数据开发实习生(可转正)- 商业变现 +https://www.nowcoder.com/discuss/363813?type=0&order=0&pos=14&page=0 + +### 【亚马逊】软件工程师 (SDE II/SDE III) +https://www.nowcoder.com/discuss/362803?type=0&order=0&pos=111&page=1 + +### 【阿里】北京招20、21年毕业的实习/校招生 +https://www.nowcoder.com/discuss/362551?type=0&order=0&pos=92&page=1 + +### 【eBay 智能营销事业部】 - 软件开发实习生 +https://www.nowcoder.com/discuss/362874?type=0&order=0&pos=65&page=1 + +### 【shopee】春招开启 +https://www.nowcoder.com/discuss/362129?type=0&order=0&pos=32&page=1 + +### 【BIGO】2021届实习招聘内推 +https://www.nowcoder.com/discuss/362915?type=0&order=0&pos=8&page=0 + +### 【百度推荐策略部(百度两大核心部门之一)】社招 + 20年毕业生(校招补招)+ 21年毕业实习生 +https://www.nowcoder.com/discuss/362298?type=0&order=0&pos=38&page=0 + +### 【百度】校招补录+春季实习内推 +https://www.nowcoder.com/discuss/362216?type=0&order=0&pos=43&page=1 + +### 【CVTE】21届实习生内推 +https://www.nowcoder.com/discuss/362006?type=0&order=0&pos=77&page=1 + +### 【春招】Shopee研发中心2020春季校招开启!速抢! +https://www.nowcoder.com/discuss/362029?type=0&order=0&pos=43&page=1 + +### 【字节跳动】 +https://www.nowcoder.com/discuss/356297?type=0&order=0&pos=50&page=1 +https://www.nowcoder.com/discuss/361626?type=0&order=0&pos=51&page=1 +https://www.nowcoder.com/discuss/361964?type=0&order=0&pos=87&page=1 + +### 【百度】客户端/前端/后端/数据招人-实习、校招、社招 +https://www.nowcoder.com/discuss/361514?type=0&order=0&pos=109&page=1 + +### 【猫眼内推】20届补招 上海还有岗位 +https://www.nowcoder.com/discuss/318650?type=0&order=0&pos=37&page=6 + +### 【微博校招】【杭州】2020届校招补录研发岗 +https://www.nowcoder.com/discuss/359923?type=0&order=0&pos=23&page=1 + +### 【雪浪数制】招聘大数据开发/java开发(应届生) +https://www.nowcoder.com/discuss/345671?type=0&order=0&pos=16&page=1 + +### 【苏宁】2020届春招提前批 +https://www.nowcoder.com/discuss/346355?type=0&order=0&pos=15&page=3 + +### 【海康威视】春招来袭~~(20届校招、21届实习、社招) +https://www.nowcoder.com/discuss/361841?type=0&order=0&pos=14&page=1 + +### 【广州速游】春招 +https://www.nowcoder.com/discuss/361662?type=0&order=0&pos=10&page=1 + +### 【VIVO】2020春招 +https://www.nowcoder.com/discuss/360785?type=0&order=0&pos=35&page=1 + +### 【ThoughtWorks补录】2020届 +https://www.nowcoder.com/discuss/358055?type=0&order=0&pos=17&page=1 + +### 【用友】2020春招 +https://www.nowcoder.com/discuss/361223?type=0&order=0&pos=12&page=2 + +### 【寒武纪】补招 +https://www.nowcoder.com/discuss/361245?type=0&order=0&pos=51&page=1 + +### 【深信服】2021届实习生 +https://www.nowcoder.com/discuss/360510?type=0&order=0&pos=81&page=1 + +### 【招银网络科技】 +https://www.nowcoder.com/discuss/361236?type=7&order=0&pos=33&page=1 + +### 【帆软】2020春招 +https://www.nowcoder.com/discuss/361185?type=7&order=0&pos=22&page=1 + +### 【快手】20届补招 +https://www.nowcoder.com/discuss/348236?type=7&order=0&pos=56&page=5 + +### 【百度秋招补招】智能生活事业群组-众多岗位来袭! +https://www.nowcoder.com/discuss/360968?type=7&order=0&pos=68&page=1 + +### 【阿里巴巴-淘宝2020春招20应届或者21届实习】面向所有研发岗招聘 +https://www.nowcoder.com/discuss/361377?type=0&order=0&pos=6&page=1 + +### 【淘宝消息平台】春季招聘-20届春招、21届实习 +https://www.nowcoder.com/discuss/361618?type=0&order=0&pos=25&page=1 + +### 【搜狐】【校招】部门直招ing-日常实习、实习转正、春招 +https://www.nowcoder.com/discuss/361361 + +### 【国企】【光大科技】2020春招开始!支持线上面试! +https://www.nowcoder.com/discuss/358457?type=0&order=0&pos=55&page=2 + +### 【百度智能云计算部部门】实习 +https://www.nowcoder.com/discuss/356380?type=7&order=0&pos=35&page=1 + +### 【平安科技人工智能中心】实习生招聘 +https://www.nowcoder.com/discuss/359518?type=7&order=0&pos=38&page=1 + +### 【猿辅导】实习 +https://www.nowcoder.com/discuss/357041?type=7&order=0&pos=39&page=1 + +### 【2021年实习岗】数美科技【数据挖掘,研发、数据分析】 +https://www.nowcoder.com/discuss/359938?type=7&order=0&pos=50&page=1 + +### 【商汤科技】Java开发实习生(上海) +https://www.nowcoder.com/discuss/361243?type=7&order=0&pos=11&page=1 + +### 【虎牙2021届】实习生招聘 +https://www.nowcoder.com/discuss/360960?type=7&order=0&pos=70&page=1 + +### 【IBM实习】开发工程师、数据工程师 +https://www.nowcoder.com/discuss/360897?type=7&order=0&pos=98&page=1 + +### 【实习(可转正)】【组内直推】滴滴出行 大数据分析实习生 +https://www.nowcoder.com/discuss/332399?type=7&order=0&pos=102&page=1 \ No newline at end of file diff --git a/README.md b/README.md index ad8810e..ffa8611 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,128 @@ -# Big-Data-Project -Hadoop2.x、Zookeeper、Flume、Hive、Hbase、Kafka、Spark2.x、SparkStreaming、MySQL、Hue、J2EE、websoket、Echarts +## TALKDATA(恭喜获得一枚宝藏博主) +TALKDATA,蚂蚁程序员,非科班转行大数据开发,B站UP主,专注于面试分享,已帮助500+同学进入大厂! + +通过以下可以找到我: + + - 哔哩哔哩:[TALKDATA][1] + - 公众号:[TALKDATA][2] + - 知乎:[TALKDATA][3] + - QQ群:316916234 + +### 分享历程 +![image](image/TALKDATA_share.png) + +### 学习路线 + + - 【我是如何从非科班成功转行大数据开发】-【[视频版][4]】【[文字版][5]】 + - 【从转行开始到入职蚂蚁全过程】-【[文字版][6]】 + - 【Java后端和大数据开发学习路线】-【[视频版][7]】【[下载][8]】 + - 【岗位选择:Java后端vs大数据开发vs算法】-【[视频版][9]】 + + +### 简历设计 + + - 【如何写出有亮点的简历】-【[视频版][10]】【[文字版][11]】 + - 【直播修改简历:大数据开发、Java后端、算法简历】-【[视频版][12]】 + +### 项目包装 + + - 【如何封装简历上的项目】-【[视频版][13]】 + - 【开源:大数据实时分析可视化系统项目】-【[访问链接][14]】 + - 【Java后端项目推荐之项目亮难点设计】-【[视频版][15]】 + - 【大数据项目推荐之为你的实时计算和数仓项目增加亮难点】-【[视频版][16]】 + +### 大厂实习 + + - 【实习计划安排、如何转正、转正答辩技巧】-【[视频版][17]】【[文字版][18]】 + +### 备战春秋招 + + - 【春招投递简历的最佳姿势】【[视频版][19]】【[文字版][20]】 + - 【互联网大厂面试套路解析】【[视频版][21]】 + - 【秋招面试经验分享】【[视频版][22]】【[文字版][23]】 + - 【短时间内准备秋招的面试技巧】【[视频版][24]】 + - 【模拟大厂面试-看优秀的程序媛小姐姐如何与面试官周旋】【[视频版][25]】 + - 【全网最用心的公司汇总,总有一款offer适合你】【[视频版][26]】 + - 【OFFER求比较-都是40w+的offer好难选!】【[视频版][27]】 + +### 面试经验 + + - 【非科班如何成为一个大厂offer收割机】秋招斩获阿里云/字节/百度等sp级offer 【[视频版][28]】 + - 【大厂面经专栏:不定期更新】【[文字版][29]】 + +### 书籍讲解 + + - 【Java大数据开发面试书单】【[视频][30]】【[下载][31]】 + - [《Java核心技术卷1》][32] + - [《Java编程思想》][33] + - [《深入理解Java虚拟机》][34] + - [《实战Java高并发程序设计》][35] + - [《Java并发编程的艺术》][36] + - [《Redis设计与实现》][37] + - [《Redis深度历险:核心原理与应用实践》][38] + - [《MySQL技术内幕》][39] + - [《Hadoop权威指南》][40] + - [《Spark大数据处理技术》][41] + - [《从PAXOS到Zookeeper分布式一致性原理与实践》][42] + - [《现代操作系统》][43] + - [《计算机网络:自顶向下方法》][44] + +### 工作心得 + + - 【[大厂工作真实感受、互联网的发展、未来规划][45]】 + +### 知识分享 + - 【[一文带你入门大数据][46]】 + +### 求职交流群 +本QQ群用于求职交流、技术探讨以及TALKDATA最新面经动态分享等。 + +![image](image/qqqun.jpg) + + + [1]: https://space.bilibili.com/326797886 + [2]: https://mp.weixin.qq.com/s/S4-3ptU4tzRJIJaWDALEwg + [3]: https://www.zhihu.com/people/changeforeda + [4]: https://www.bilibili.com/video/BV1SJ411T7DV + [5]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483698&idx=1&sn=7a4a698356bba0547135e7203a6f2bab&chksm=cf91afd8f8e626cedc546403ac9bdfb838bf1328742c705c03d8970978c53e2cc8580d4e904d&scene=178&cur_album_id=1417724228044210177#rd + [6]: https://mp.weixin.qq.com/s/WBiPD_86XkMkepIN4J9euQ + [7]: https://www.bilibili.com/video/BV1HV411d7t6 + [8]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483818&idx=1&sn=8d6643e2bc280d07e259f451ce998207&chksm=cf91af40f8e626560eb164bd5e46984da653efa3bf6f38e6b70fa99cc181027052f776e4784c&scene=178&cur_album_id=1417724228044210177#rd + [9]: https://www.bilibili.com/video/BV1sJ41127EP + [10]: https://www.bilibili.com/video/BV1XE411P7s8 + [11]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483717&idx=1&sn=3ed1c1fcb27fc4c55abb563f0840b103&chksm=cf91afaff8e626b9c7863c3306284b4e6a29d7718f4691db628bb354c48c8907e55e8a3fa8a0&scene=178&cur_album_id=1417724228044210177#rd + [12]: https://www.bilibili.com/video/BV1nJ411h718 + [13]: https://www.bilibili.com/video/BV1XE411P7s8 + [14]: https://github.com/TALKDATA/JavaBigData/blob/master/news-project.md + [15]: https://www.bilibili.com/video/BV1VT4y1r71g + [16]: https://www.bilibili.com/video/BV193411K77p + [17]: https://www.bilibili.com/video/BV1hQ4y1d7MF + [18]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483842&idx=1&sn=e51a6cd0d13650fbebe360f4ee63437b&chksm=cf91af28f8e6263e0ccc3fd2d623f8d44fe4d82bc5b412aaee54739412eff1c932801de757e5&scene=178&cur_album_id=1417724228044210177#rd + [19]: https://www.bilibili.com/video/BV1y7411K74s + [20]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483755&idx=1&sn=01b0368084267dee649d6fa26954df4d&chksm=cf91af81f8e62697a03335c94357c9a562e31f3e837a887e9395a3b2e4c415a81423da120a52&scene=178&cur_album_id=1417724228044210177#rd + [21]: https://www.bilibili.com/video/BV1sK4y1T745 + [22]: https://www.bilibili.com/video/BV12T4y177hj + [23]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247483923&idx=1&sn=844f925fb21a742aa73e40063166c862&chksm=cf91acf9f8e625ef2e20bfd35437d5e5d19777b57ff37f52eebec6b85ca1c2bb2a98a30a9a98&scene=178&cur_album_id=1417724228044210177#rd + [24]: https://www.bilibili.com/video/BV1Rb4y1k7Mc + [25]: https://www.bilibili.com/video/BV1Q7411U7ex + [26]: https://www.bilibili.com/video/BV1d34y1x7ts + [27]: https://www.bilibili.com/video/BV1Dh411t7Bq + [28]: https://www.bilibili.com/video/BV1p34y1x7wa + [29]: https://mp.weixin.qq.com/mp/appmsgalbum?action=getalbum&album_id=1417724228044210177&__biz=Mzg4NzAxMjIyOQ==&uin=&key=&devicetype=Windows%2010%20x64&version=63030073&lang=zh_CN&ascene=7&fontgear=2 + [30]: https://www.bilibili.com/video/BV1UT4y137M9 + [31]: https://mp.weixin.qq.com/s/Frefd9h1t_J8xyUihQfdig + [32]: https://www.bilibili.com/video/BV1Ma411w714 + [33]: https://www.bilibili.com/video/BV1mE411B7PT + [34]: https://www.bilibili.com/video/BV1yE411R7co + [35]: https://www.bilibili.com/video/BV1ZE411S79m + [36]: https://www.bilibili.com/video/BV1AC4y1h78E + [37]: https://www.bilibili.com/video/BV1WE411f7fo + [38]: https://www.bilibili.com/video/BV1aE411o7Fk + [39]: https://www.bilibili.com/video/BV1CJ411t7Ku + [40]: https://www.bilibili.com/video/BV1DE411r7Fn + [41]: https://www.bilibili.com/video/BV1pJ411W7P5libili.com/video/BV1DE411r7Fn + [42]: https://www.bilibili.com/video/BV1EJ411L7AU + [43]: https://www.bilibili.com/video/BV1xJ411p7db + [44]: https://www.bilibili.com/video/BV1t7411q78v + [45]: https://www.bilibili.com/video/BV1DL4y1E7oo + [46]: https://mp.weixin.qq.com/s?__biz=Mzg4NzAxMjIyOQ==&mid=2247484509&idx=1&sn=427238d3f417911fa14f58d0092bc242&chksm=cf91aab7f8e623a1fa36bcb87b6fe90b41d20c924f52b2be59d2c09a145c96c45e8066e60613&token=1514526987&lang=zh_CN#rd \ No newline at end of file diff --git a/code/DataProducer/.idea/artifacts/DataProducer_jar.xml b/code/DataProducer/.idea/artifacts/DataProducer_jar.xml new file mode 100644 index 0000000..f034a42 --- /dev/null +++ b/code/DataProducer/.idea/artifacts/DataProducer_jar.xml @@ -0,0 +1,8 @@ + + + $PROJECT_DIR$/out/artifacts/DataProducer_jar + + + + + \ No newline at end of file diff --git a/code/DataProducer/.idea/artifacts/DataProducer_jar2.xml b/code/DataProducer/.idea/artifacts/DataProducer_jar2.xml new file mode 100644 index 0000000..596a7be --- /dev/null +++ b/code/DataProducer/.idea/artifacts/DataProducer_jar2.xml @@ -0,0 +1,8 @@ + + + $PROJECT_DIR$/out/artifacts/DataProducer_jar2 + + + + + \ No newline at end of file diff --git a/code/DataProducer/.idea/artifacts/DataProducer_jar3.xml b/code/DataProducer/.idea/artifacts/DataProducer_jar3.xml new file mode 100644 index 0000000..c23ab96 --- /dev/null +++ b/code/DataProducer/.idea/artifacts/DataProducer_jar3.xml @@ -0,0 +1,8 @@ + + + $PROJECT_DIR$/out/artifacts/DataProducer_jar3 + + + + + \ No newline at end of file diff --git a/code/DataProducer/.idea/misc.xml b/code/DataProducer/.idea/misc.xml new file mode 100644 index 0000000..0548357 --- /dev/null +++ b/code/DataProducer/.idea/misc.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/code/DataProducer/.idea/modules.xml b/code/DataProducer/.idea/modules.xml new file mode 100644 index 0000000..1b3af89 --- /dev/null +++ b/code/DataProducer/.idea/modules.xml @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/code/DataProducer/.idea/workspace.xml b/code/DataProducer/.idea/workspace.xml new file mode 100644 index 0000000..daa8adc --- /dev/null +++ b/code/DataProducer/.idea/workspace.xml @@ -0,0 +1,276 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true + DEFINITION_ORDER + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1548571318643 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TestSpark:jar + + + + + + + + Web (TestSpark)|Web + + + + + + + + + + + + + + + 1.8 + + + + + + + + TestSpark + + + + + + + + 1.8 + + + + + + + + scala-sdk-2.11.12 + + + + + + + + \ No newline at end of file diff --git a/code/TestSpark/TestSpark.iml b/code/TestSpark/TestSpark.iml new file mode 100644 index 0000000..78b2cc5 --- /dev/null +++ b/code/TestSpark/TestSpark.iml @@ -0,0 +1,2 @@ + + \ No newline at end of file diff --git a/code/TestSpark/out/artifacts/TestSpark_jar/TestSpark.jar b/code/TestSpark/out/artifacts/TestSpark_jar/TestSpark.jar new file mode 100644 index 0000000..eb7e126 Binary files /dev/null and b/code/TestSpark/out/artifacts/TestSpark_jar/TestSpark.jar differ diff --git a/code/TestSpark/pom.xml b/code/TestSpark/pom.xml new file mode 100644 index 0000000..a4df494 --- /dev/null +++ b/code/TestSpark/pom.xml @@ -0,0 +1,53 @@ + + + + + 4.0.0 + war + + TestSpark + com.kfk.spark + TestSpark + 1.0-SNAPSHOT + + + + 2.11.12 + 2.11 + 2.2.0 + + + + + org.apache.spark + spark-core_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-streaming_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-sql_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-hive_${scala.binary.version} + ${spark.version} + + + org.apache.spark + spark-streaming-kafka-0-10_${scala.binary.version} + ${spark.version} + + + org.apache.hadoop + hadoop-client + 2.6.0 + + + + diff --git a/code/TestSpark/src/main/resources/META-INF/MANIFEST.MF b/code/TestSpark/src/main/resources/META-INF/MANIFEST.MF new file mode 100644 index 0000000..c4cd13b --- /dev/null +++ b/code/TestSpark/src/main/resources/META-INF/MANIFEST.MF @@ -0,0 +1,3 @@ +Manifest-Version: 1.0 +Main-Class: test + diff --git a/code/TestSpark/src/main/scala/TestStreaming.scala b/code/TestSpark/src/main/scala/TestStreaming.scala new file mode 100644 index 0000000..75aa35a --- /dev/null +++ b/code/TestSpark/src/main/scala/TestStreaming.scala @@ -0,0 +1,23 @@ +import org.apache.spark.SparkConf +import org.apache.spark.streaming.{Seconds, StreamingContext} + + +object TestStreaming { + + def main(args: Array[String]): Unit = { + + val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") + val ssc = new StreamingContext(conf, Seconds(5)) + + + val lines = ssc.socketTextStream("bigdata-pro01.kfk.com",9999) + val words = lines.flatMap(_.split(" ")) + //map reduce 计算 + val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) + wordCounts.print() + ssc.start() + ssc.awaitTermination() + + } + +} diff --git a/code/TestSpark/src/main/scala/test.scala b/code/TestSpark/src/main/scala/test.scala new file mode 100644 index 0000000..04cc42d --- /dev/null +++ b/code/TestSpark/src/main/scala/test.scala @@ -0,0 +1,20 @@ +import org.apache.spark.sql.SparkSession + +object test { + def main(args: Array[String]): Unit = { + + val spark = SparkSession + .builder + .master("yarn-cluster") + // .master("local[2]") + .appName("HdfsTest") + .getOrCreate() + + val path = args(0) + val out = args(1) + + val rdd = spark.sparkContext.textFile(path) + val lines = rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey((a,b)=>(a+b)).saveAsTextFile(out) + } + +} diff --git a/code/TestSpark/src/main/webapp/WEB-INF/applicationContext.xml b/code/TestSpark/src/main/webapp/WEB-INF/applicationContext.xml new file mode 100644 index 0000000..9410604 --- /dev/null +++ b/code/TestSpark/src/main/webapp/WEB-INF/applicationContext.xml @@ -0,0 +1,43 @@ + + + + + + + + + + + + + + + + + + diff --git a/code/TestSpark/src/main/webapp/WEB-INF/log4j.xml b/code/TestSpark/src/main/webapp/WEB-INF/log4j.xml new file mode 100644 index 0000000..edb3767 --- /dev/null +++ b/code/TestSpark/src/main/webapp/WEB-INF/log4j.xml @@ -0,0 +1,38 @@ + + + + + + + + + + + + + + + + + + + diff --git a/code/TestSpark/src/main/webapp/WEB-INF/web.xml b/code/TestSpark/src/main/webapp/WEB-INF/web.xml new file mode 100644 index 0000000..208b385 --- /dev/null +++ b/code/TestSpark/src/main/webapp/WEB-INF/web.xml @@ -0,0 +1,119 @@ + + + + + + + + + + + Multipart MIME handling filter for Cocoon + Cocoon multipart filter + CocoonMultipartFilter + org.apache.cocoon.servlet.multipart.MultipartFilter + + + + + Log debug information about each request + Cocoon debug filter + CocoonDebugFilter + org.apache.cocoon.servlet.DebugFilter + + + + + + + CocoonMultipartFilter + Cocoon + + + CocoonMultipartFilter + DispatcherServlet + + + + + + + + + org.springframework.web.context.ContextLoaderListener + + + + + org.springframework.web.context.request.RequestContextListener + + + + + + + Cocoon blocks dispatcher + DispatcherServlet + DispatcherServlet + org.apache.cocoon.servletservice.DispatcherServlet + 1 + + + + + + + DispatcherServlet + /* + + + + \ No newline at end of file diff --git a/code/TestSpark/target/classes/META-INF/MANIFEST.MF b/code/TestSpark/target/classes/META-INF/MANIFEST.MF new file mode 100644 index 0000000..c4cd13b --- /dev/null +++ b/code/TestSpark/target/classes/META-INF/MANIFEST.MF @@ -0,0 +1,3 @@ +Manifest-Version: 1.0 +Main-Class: test + diff --git a/code/TestSpark/target/classes/TestStreaming$$anonfun$1.class b/code/TestSpark/target/classes/TestStreaming$$anonfun$1.class new file mode 100644 index 0000000..5b52d7a Binary files /dev/null and b/code/TestSpark/target/classes/TestStreaming$$anonfun$1.class differ diff --git a/code/TestSpark/target/classes/TestStreaming$$anonfun$2.class b/code/TestSpark/target/classes/TestStreaming$$anonfun$2.class new file mode 100644 index 0000000..46565dc Binary files /dev/null and b/code/TestSpark/target/classes/TestStreaming$$anonfun$2.class differ diff --git a/code/TestSpark/target/classes/TestStreaming$$anonfun$3.class b/code/TestSpark/target/classes/TestStreaming$$anonfun$3.class new file mode 100644 index 0000000..4dc03d9 Binary files /dev/null and b/code/TestSpark/target/classes/TestStreaming$$anonfun$3.class differ diff --git a/code/TestSpark/target/classes/TestStreaming$.class b/code/TestSpark/target/classes/TestStreaming$.class new file mode 100644 index 0000000..e9bb3ba Binary files /dev/null and b/code/TestSpark/target/classes/TestStreaming$.class differ diff --git a/code/TestSpark/target/classes/TestStreaming.class b/code/TestSpark/target/classes/TestStreaming.class new file mode 100644 index 0000000..5fee3cc Binary files /dev/null and b/code/TestSpark/target/classes/TestStreaming.class differ diff --git a/code/TestSpark/target/classes/test$$anonfun$1.class b/code/TestSpark/target/classes/test$$anonfun$1.class new file mode 100644 index 0000000..05fd425 Binary files /dev/null and b/code/TestSpark/target/classes/test$$anonfun$1.class differ diff --git a/code/TestSpark/target/classes/test$$anonfun$2.class b/code/TestSpark/target/classes/test$$anonfun$2.class new file mode 100644 index 0000000..ca0bfac Binary files /dev/null and b/code/TestSpark/target/classes/test$$anonfun$2.class differ diff --git a/code/TestSpark/target/classes/test$$anonfun$3.class b/code/TestSpark/target/classes/test$$anonfun$3.class new file mode 100644 index 0000000..8f75436 Binary files /dev/null and b/code/TestSpark/target/classes/test$$anonfun$3.class differ diff --git a/code/TestSpark/target/classes/test$.class b/code/TestSpark/target/classes/test$.class new file mode 100644 index 0000000..f9498db Binary files /dev/null and b/code/TestSpark/target/classes/test$.class differ diff --git a/code/TestSpark/target/classes/test.class b/code/TestSpark/target/classes/test.class new file mode 100644 index 0000000..4d7c83c Binary files /dev/null and b/code/TestSpark/target/classes/test.class differ diff --git a/code/flume-ng-sinks/flume-dataset-sink/pom.xml b/code/flume-ng-sinks/flume-dataset-sink/pom.xml new file mode 100644 index 0000000..1e8a07b --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/pom.xml @@ -0,0 +1,145 @@ + + + + + 4.0.0 + + + flume-ng-sinks + org.apache.flume + 1.7.0 + + + org.apache.flume.flume-ng-sinks + flume-dataset-sink + Flume NG Kite Dataset Sink + + + + + org.apache.rat + apache-rat-plugin + + + org.apache.felix + maven-bundle-plugin + 2.3.7 + true + true + + + + + + + + org.apache.flume + flume-ng-sdk + + + + org.apache.flume + flume-ng-configuration + + + + org.apache.flume + flume-ng-core + + + + org.kitesdk + kite-data-core + + + + org.kitesdk + kite-data-hive + + + + org.kitesdk + kite-data-hbase + + + + org.apache.avro + avro + + + + org.apache.hive + hive-exec + true + + + + org.apache.hive + hive-metastore + true + + + + + org.apache.hadoop + hadoop-common + ${hadoop2.version} + true + + + + org.slf4j + slf4j-api + + + + com.google.guava + guava + + + + junit + junit + test + + + + org.apache.hadoop + hadoop-minicluster + ${hadoop2.version} + test + + + + org.slf4j + slf4j-log4j12 + test + + + + org.mockito + mockito-all + test + + + + + diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java new file mode 100644 index 0000000..fa31262 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java @@ -0,0 +1,582 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.kite; + +import org.apache.flume.auth.FlumeAuthenticationUtil; +import org.apache.flume.auth.PrivilegedExecutor; +import org.apache.flume.sink.kite.parser.EntityParserFactory; +import org.apache.flume.sink.kite.parser.EntityParser; +import org.apache.flume.sink.kite.policy.FailurePolicy; +import org.apache.flume.sink.kite.policy.FailurePolicyFactory; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; +import com.google.common.collect.Lists; + +import java.net.URI; +import java.security.PrivilegedAction; +import java.util.List; +import java.util.concurrent.TimeUnit; +import org.apache.avro.Schema; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.generic.GenericRecord; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.conf.Configurable; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.flume.sink.AbstractSink; +import org.kitesdk.data.Dataset; +import org.kitesdk.data.DatasetDescriptor; +import org.kitesdk.data.DatasetIOException; +import org.kitesdk.data.DatasetNotFoundException; +import org.kitesdk.data.DatasetWriter; +import org.kitesdk.data.Datasets; +import org.kitesdk.data.Flushable; +import org.kitesdk.data.Syncable; +import org.kitesdk.data.View; +import org.kitesdk.data.spi.Registration; +import org.kitesdk.data.URIBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static org.apache.flume.sink.kite.DatasetSinkConstants.*; +import org.kitesdk.data.Format; +import org.kitesdk.data.Formats; + +/** + * Sink that writes events to a Kite Dataset. This sink will parse the body of + * each incoming event and store the resulting entity in a Kite Dataset. It + * determines the destination Dataset by opening a dataset URI + * {@code kite.dataset.uri} or opening a repository URI, {@code kite.repo.uri}, + * and loading a Dataset by name, {@code kite.dataset.name}, and namespace, + * {@code kite.dataset.namespace}. + */ +public class DatasetSink extends AbstractSink implements Configurable { + + private static final Logger LOG = LoggerFactory.getLogger(DatasetSink.class); + + private Context context = null; + private PrivilegedExecutor privilegedExecutor; + + private String datasetName = null; + private URI datasetUri = null; + private Schema datasetSchema = null; + private DatasetWriter writer = null; + + /** + * The number of events to process as a single batch. + */ + private long batchSize = DEFAULT_BATCH_SIZE; + + /** + * The number of seconds to wait before rolling a writer. + */ + private int rollIntervalSeconds = DEFAULT_ROLL_INTERVAL; + + /** + * Flag that says if Flume should commit on every batch. + */ + private boolean commitOnBatch = DEFAULT_FLUSHABLE_COMMIT_ON_BATCH; + + /** + * Flag that says if Flume should sync on every batch. + */ + private boolean syncOnBatch = DEFAULT_SYNCABLE_SYNC_ON_BATCH; + + /** + * The last time the writer rolled. + */ + private long lastRolledMillis = 0L; + + /** + * The raw number of bytes parsed. + */ + private long bytesParsed = 0L; + + /** + * A class for parsing Kite entities from Flume Events. + */ + private EntityParser parser = null; + + /** + * A class implementing a failure newPolicy for events that had a + non-recoverable error during processing. + */ + private FailurePolicy failurePolicy = null; + + private SinkCounter counter = null; + + /** + * The Kite entity + */ + private GenericRecord entity = null; + // TODO: remove this after PARQUET-62 is released + private boolean reuseEntity = true; + + /** + * The Flume transaction. Used to keep transactions open across calls to + * process. + */ + private Transaction transaction = null; + + /** + * Internal flag on if there has been a batch of records committed. This is + * used during rollback to know if the current writer needs to be closed. + */ + private boolean committedBatch = false; + + // Factories + private static final EntityParserFactory ENTITY_PARSER_FACTORY = + new EntityParserFactory(); + private static final FailurePolicyFactory FAILURE_POLICY_FACTORY = + new FailurePolicyFactory(); + + /** + * Return the list of allowed formats. + * @return The list of allowed formats. + */ + protected List allowedFormats() { + return Lists.newArrayList("avro", "parquet"); + } + + @Override + public void configure(Context context) { + this.context = context; + + String principal = context.getString(AUTH_PRINCIPAL); + String keytab = context.getString(AUTH_KEYTAB); + String effectiveUser = context.getString(AUTH_PROXY_USER); + + this.privilegedExecutor = FlumeAuthenticationUtil.getAuthenticator( + principal, keytab).proxyAs(effectiveUser); + + // Get the dataset URI and name from the context + String datasetURI = context.getString(CONFIG_KITE_DATASET_URI); + if (datasetURI != null) { + this.datasetUri = URI.create(datasetURI); + this.datasetName = uriToName(datasetUri); + } else { + String repositoryURI = context.getString(CONFIG_KITE_REPO_URI); + Preconditions.checkNotNull(repositoryURI, "No dataset configured. Setting " + + CONFIG_KITE_DATASET_URI + " is required."); + + this.datasetName = context.getString(CONFIG_KITE_DATASET_NAME); + Preconditions.checkNotNull(datasetName, "No dataset configured. Setting " + + CONFIG_KITE_DATASET_URI + " is required."); + + String namespace = context.getString(CONFIG_KITE_DATASET_NAMESPACE, + DEFAULT_NAMESPACE); + + this.datasetUri = new URIBuilder(repositoryURI, namespace, datasetName) + .build(); + } + this.setName(datasetUri.toString()); + + if (context.getBoolean(CONFIG_SYNCABLE_SYNC_ON_BATCH, + DEFAULT_SYNCABLE_SYNC_ON_BATCH)) { + Preconditions.checkArgument( + context.getBoolean(CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + DEFAULT_FLUSHABLE_COMMIT_ON_BATCH), "Configuration error: " + + CONFIG_FLUSHABLE_COMMIT_ON_BATCH + " must be set to true when " + + CONFIG_SYNCABLE_SYNC_ON_BATCH + " is set to true."); + } + + // Create the configured failure failurePolicy + this.failurePolicy = FAILURE_POLICY_FACTORY.newPolicy(context); + + // other configuration + this.batchSize = context.getLong(CONFIG_KITE_BATCH_SIZE, + DEFAULT_BATCH_SIZE); + this.rollIntervalSeconds = context.getInteger(CONFIG_KITE_ROLL_INTERVAL, + DEFAULT_ROLL_INTERVAL); + + this.counter = new SinkCounter(datasetName); + } + + @Override + public synchronized void start() { + this.lastRolledMillis = System.currentTimeMillis(); + counter.start(); + // signal that this sink is ready to process + LOG.info("Started DatasetSink " + getName()); + super.start(); + } + + /** + * Causes the sink to roll at the next {@link #process()} call. + */ + @VisibleForTesting + void roll() { + this.lastRolledMillis = 0L; + } + + @VisibleForTesting + DatasetWriter getWriter() { + return writer; + } + + @VisibleForTesting + void setWriter(DatasetWriter writer) { + this.writer = writer; + } + + @VisibleForTesting + void setParser(EntityParser parser) { + this.parser = parser; + } + + @VisibleForTesting + void setFailurePolicy(FailurePolicy failurePolicy) { + this.failurePolicy = failurePolicy; + } + + @Override + public synchronized void stop() { + counter.stop(); + + try { + // Close the writer and commit the transaction, but don't create a new + // writer since we're stopping + closeWriter(); + commitTransaction(); + } catch (EventDeliveryException ex) { + rollbackTransaction(); + + LOG.warn("Closing the writer failed: " + ex.getLocalizedMessage()); + LOG.debug("Exception follows.", ex); + // We don't propogate the exception as the transaction would have been + // rolled back and we can still finish stopping + } + + // signal that this sink has stopped + LOG.info("Stopped dataset sink: " + getName()); + super.stop(); + } + + @Override + public Status process() throws EventDeliveryException { + long processedEvents = 0; + + try { + if (shouldRoll()) { + closeWriter(); + commitTransaction(); + createWriter(); + } + + // The writer shouldn't be null at this point + Preconditions.checkNotNull(writer, + "Can't process events with a null writer. This is likely a bug."); + Channel channel = getChannel(); + + // Enter the transaction boundary if we haven't already + enterTransaction(channel); + + for (; processedEvents < batchSize; processedEvents += 1) { + Event event = channel.take(); + + if (event == null) { + // no events available in the channel + break; + } + + write(event); + } + + // commit transaction + if (commitOnBatch) { + // Flush/sync before commiting. A failure here will result in rolling back + // the transaction + if (syncOnBatch && writer instanceof Syncable) { + ((Syncable) writer).sync(); + } else if (writer instanceof Flushable) { + ((Flushable) writer).flush(); + } + boolean committed = commitTransaction(); + Preconditions.checkState(committed, + "Tried to commit a batch when there was no transaction"); + committedBatch |= committed; + } + } catch (Throwable th) { + // catch-all for any unhandled Throwable so that the transaction is + // correctly rolled back. + rollbackTransaction(); + + if (commitOnBatch && committedBatch) { + try { + closeWriter(); + } catch (EventDeliveryException ex) { + LOG.warn("Error closing writer there may be temp files that need to" + + " be manually recovered: " + ex.getLocalizedMessage()); + LOG.debug("Exception follows.", ex); + } + } else { + this.writer = null; + } + + // handle the exception + Throwables.propagateIfInstanceOf(th, Error.class); + Throwables.propagateIfInstanceOf(th, EventDeliveryException.class); + throw new EventDeliveryException(th); + } + + if (processedEvents == 0) { + counter.incrementBatchEmptyCount(); + return Status.BACKOFF; + } else if (processedEvents < batchSize) { + counter.incrementBatchUnderflowCount(); + } else { + counter.incrementBatchCompleteCount(); + } + + counter.addToEventDrainSuccessCount(processedEvents); + + return Status.READY; + } + + /** + * Parse the event using the entity parser and write the entity to the dataset. + * + * @param event The event to write + * @throws EventDeliveryException An error occurred trying to write to the + dataset that couldn't or shouldn't be + handled by the failure policy. + */ + @VisibleForTesting + void write(Event event) throws EventDeliveryException { + try { + this.entity = parser.parse(event, reuseEntity ? entity : null); + this.bytesParsed += event.getBody().length; + + // writeEncoded would be an optimization in some cases, but HBase + // will not support it and partitioned Datasets need to get partition + // info from the entity Object. We may be able to avoid the + // serialization round-trip otherwise. + writer.write(entity); + } catch (NonRecoverableEventException ex) { + failurePolicy.handle(event, ex); + } catch (DataFileWriter.AppendWriteException ex) { + failurePolicy.handle(event, ex); + } catch (RuntimeException ex) { + Throwables.propagateIfInstanceOf(ex, EventDeliveryException.class); + throw new EventDeliveryException(ex); + } + } + + /** + * Create a new writer. + * + * This method also re-loads the dataset so updates to the configuration or + * a dataset created after Flume starts will be loaded. + * + * @throws EventDeliveryException There was an error creating the writer. + */ + @VisibleForTesting + void createWriter() throws EventDeliveryException { + // reset the commited flag whenever a new writer is created + committedBatch = false; + try { + View view; + + view = privilegedExecutor.execute( + new PrivilegedAction>() { + @Override + public Dataset run() { + return Datasets.load(datasetUri); + } + }); + + DatasetDescriptor descriptor = view.getDataset().getDescriptor(); + Format format = descriptor.getFormat(); + Preconditions.checkArgument(allowedFormats().contains(format.getName()), + "Unsupported format: " + format.getName()); + + Schema newSchema = descriptor.getSchema(); + if (datasetSchema == null || !newSchema.equals(datasetSchema)) { + this.datasetSchema = descriptor.getSchema(); + // dataset schema has changed, create a new parser + parser = ENTITY_PARSER_FACTORY.newParser(datasetSchema, context); + } + + this.reuseEntity = !(Formats.PARQUET.equals(format)); + + // TODO: Check that the format implements Flushable after CDK-863 + // goes in. For now, just check that the Dataset is Avro format + this.commitOnBatch = context.getBoolean(CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + DEFAULT_FLUSHABLE_COMMIT_ON_BATCH) && (Formats.AVRO.equals(format)); + + // TODO: Check that the format implements Syncable after CDK-863 + // goes in. For now, just check that the Dataset is Avro format + this.syncOnBatch = context.getBoolean(CONFIG_SYNCABLE_SYNC_ON_BATCH, + DEFAULT_SYNCABLE_SYNC_ON_BATCH) && (Formats.AVRO.equals(format)); + + this.datasetName = view.getDataset().getName(); + + this.writer = view.newWriter(); + + // Reset the last rolled time and the metrics + this.lastRolledMillis = System.currentTimeMillis(); + this.bytesParsed = 0L; + } catch (DatasetNotFoundException ex) { + throw new EventDeliveryException("Dataset " + datasetUri + " not found." + + " The dataset must be created before Flume can write to it.", ex); + } catch (RuntimeException ex) { + throw new EventDeliveryException("Error trying to open a new" + + " writer for dataset " + datasetUri, ex); + } + } + + /** + * Return true if the sink should roll the writer. + * + * Currently, this is based on time since the last roll or if the current + * writer is null. + * + * @return True if and only if the sink should roll the writer + */ + private boolean shouldRoll() { + long currentTimeMillis = System.currentTimeMillis(); + long elapsedTimeSeconds = TimeUnit.MILLISECONDS.toSeconds( + currentTimeMillis - lastRolledMillis); + + LOG.debug("Current time: {}, lastRolled: {}, diff: {} sec", + new Object[] {currentTimeMillis, lastRolledMillis, elapsedTimeSeconds}); + + return elapsedTimeSeconds >= rollIntervalSeconds || writer == null; + } + + /** + * Close the current writer. + * + * This method always sets the current writer to null even if close fails. + * If this method throws an Exception, callers *must* rollback any active + * transaction to ensure that data is replayed. + * + * @throws EventDeliveryException + */ + @VisibleForTesting + void closeWriter() throws EventDeliveryException { + if (writer != null) { + try { + writer.close(); + + long elapsedTimeSeconds = TimeUnit.MILLISECONDS.toSeconds( + System.currentTimeMillis() - lastRolledMillis); + LOG.info("Closed writer for {} after {} seconds and {} bytes parsed", + new Object[]{datasetUri, elapsedTimeSeconds, bytesParsed}); + } catch (DatasetIOException ex) { + throw new EventDeliveryException("Check HDFS permissions/health. IO" + + " error trying to close the writer for dataset " + datasetUri, + ex); + } catch (RuntimeException ex) { + throw new EventDeliveryException("Error trying to close the writer for" + + " dataset " + datasetUri, ex); + } finally { + // If we failed to close the writer then we give up on it as we'll + // end up throwing an EventDeliveryException which will result in + // a transaction rollback and a replay of any events written during + // the current transaction. If commitOnBatch is true, you can still + // end up with orphaned temp files that have data to be recovered. + this.writer = null; + failurePolicy.close(); + } + } + } + + /** + * Enter the transaction boundary. This will either begin a new transaction + * if one didn't already exist. If we're already in a transaction boundary, + * then this method does nothing. + * + * @param channel The Sink's channel + * @throws EventDeliveryException There was an error starting a new batch + * with the failure policy. + */ + private void enterTransaction(Channel channel) throws EventDeliveryException { + // There's no synchronization around the transaction instance because the + // Sink API states "the Sink#process() call is guaranteed to only + // be accessed by a single thread". Technically other methods could be + // called concurrently, but the implementation of SinkRunner waits + // for the Thread running process() to end before calling stop() + if (transaction == null) { + this.transaction = channel.getTransaction(); + transaction.begin(); + failurePolicy = FAILURE_POLICY_FACTORY.newPolicy(context); + } + } + + /** + * Commit and close the transaction. + * + * If this method throws an Exception the caller *must* ensure that the + * transaction is rolled back. Callers can roll back the transaction by + * calling {@link #rollbackTransaction()}. + * + * @return True if there was an open transaction and it was committed, false + * otherwise. + * @throws EventDeliveryException There was an error ending the batch with + * the failure policy. + */ + @VisibleForTesting + boolean commitTransaction() throws EventDeliveryException { + if (transaction != null) { + failurePolicy.sync(); + transaction.commit(); + transaction.close(); + this.transaction = null; + return true; + } else { + return false; + } + } + + /** + * Rollback the transaction. If there is a RuntimeException during rollback, + * it will be logged but the transaction instance variable will still be + * nullified. + */ + private void rollbackTransaction() { + if (transaction != null) { + try { + // If the transaction wasn't committed before we got the exception, we + // need to rollback. + transaction.rollback(); + } catch (RuntimeException ex) { + LOG.error("Transaction rollback failed: " + ex.getLocalizedMessage()); + LOG.debug("Exception follows.", ex); + } finally { + transaction.close(); + this.transaction = null; + } + } + } + + /** + * Get the name of the dataset from the URI + * + * @param uri The dataset or view URI + * @return The dataset name + */ + private static String uriToName(URI uri) { + return Registration.lookupDatasetUri(URI.create( + uri.getRawSchemeSpecificPart())).second().get("dataset"); + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java new file mode 100644 index 0000000..af33304 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java @@ -0,0 +1,132 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite; + +import org.kitesdk.data.URIBuilder; + +public class DatasetSinkConstants { + /** + * URI of the Kite Dataset + */ + public static final String CONFIG_KITE_DATASET_URI = "kite.dataset.uri"; + + /** + * URI of the Kite DatasetRepository. + */ + public static final String CONFIG_KITE_REPO_URI = "kite.repo.uri"; + + /** + * Name of the Kite Dataset to write into. + */ + public static final String CONFIG_KITE_DATASET_NAME = "kite.dataset.name"; + + /** + * Namespace of the Kite Dataset to write into. + */ + public static final String CONFIG_KITE_DATASET_NAMESPACE = + "kite.dataset.namespace"; + public static final String DEFAULT_NAMESPACE = URIBuilder.NAMESPACE_DEFAULT; + + /** + * Number of records to process from the incoming channel per call to process. + */ + public static final String CONFIG_KITE_BATCH_SIZE = "kite.batchSize"; + public static long DEFAULT_BATCH_SIZE = 100; + + /** + * Maximum time to wait before finishing files. + */ + public static final String CONFIG_KITE_ROLL_INTERVAL = "kite.rollInterval"; + public static int DEFAULT_ROLL_INTERVAL = 30; // seconds + + /** + * Flag for committing the Flume transaction on each batch for Flushable + * datasets. When set to false, Flume will only commit the transaction when + * roll interval has expired. Setting this to false requires enough space + * in the channel to handle all events delivered during the roll interval. + * Defaults to true. + */ + public static final String CONFIG_FLUSHABLE_COMMIT_ON_BATCH = + "kite.flushable.commiteOnBatch"; + public static boolean DEFAULT_FLUSHABLE_COMMIT_ON_BATCH = true; + + /** + * Flag for syncing the DatasetWriter on each batch for Syncable + * datasets. Defaults to true. + */ + public static final String CONFIG_SYNCABLE_SYNC_ON_BATCH = + "kite.syncable.syncOnBatch"; + public static boolean DEFAULT_SYNCABLE_SYNC_ON_BATCH = true; + + /** + * Parser used to parse Flume Events into Kite entities. + */ + public static final String CONFIG_ENTITY_PARSER = "kite.entityParser"; + + /** + * Built-in entity parsers + */ + public static final String AVRO_ENTITY_PARSER = "avro"; + public static final String DEFAULT_ENTITY_PARSER = AVRO_ENTITY_PARSER; + public static final String[] AVAILABLE_PARSERS = new String[] { + AVRO_ENTITY_PARSER + }; + + /** + * Policy used to handle non-recoverable failures. + */ + public static final String CONFIG_FAILURE_POLICY = "kite.failurePolicy"; + + /** + * Write non-recoverable Flume events to a Kite dataset. + */ + public static final String SAVE_FAILURE_POLICY = "save"; + + /** + * The URI to write non-recoverable Flume events to in the case of an error. + * If the dataset doesn't exist, it will be created. + */ + public static final String CONFIG_KITE_ERROR_DATASET_URI = + "kite.error.dataset.uri"; + + /** + * Retry non-recoverable Flume events. This will lead to a never ending cycle + * of failure, but matches the previous default semantics of the DatasetSink. + */ + public static final String RETRY_FAILURE_POLICY = "retry"; + public static final String DEFAULT_FAILURE_POLICY = RETRY_FAILURE_POLICY; + public static final String[] AVAILABLE_POLICIES = new String[] { + RETRY_FAILURE_POLICY, + SAVE_FAILURE_POLICY + }; + + /** + * Headers where avro schema information is expected. + */ + public static final String AVRO_SCHEMA_LITERAL_HEADER = + "flume.avro.schema.literal"; + public static final String AVRO_SCHEMA_URL_HEADER = "flume.avro.schema.url"; + + /** + * Hadoop authentication settings + */ + public static final String AUTH_PROXY_USER = "auth.proxyUser"; + public static final String AUTH_PRINCIPAL = "auth.kerberosPrincipal"; + public static final String AUTH_KEYTAB = "auth.kerberosKeytab"; +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java new file mode 100644 index 0000000..4373429 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java @@ -0,0 +1,53 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite; + + +/** + * A non-recoverable error trying to deliver the event. + * + * Non-recoverable event delivery failures include: + * + * 1. Error parsing the event body thrown from the {@link EntityParser} + * 2. A schema mismatch between the schema of an event and the schema of the + * destination dataset. + * 3. A missing schema from the Event header when using the + * {@link AvroEntityParser}. + */ +public class NonRecoverableEventException extends Exception { + + private static final long serialVersionUID = 3485151222482254285L; + + public NonRecoverableEventException() { + super(); + } + + public NonRecoverableEventException(String message) { + super(message); + } + + public NonRecoverableEventException(String message, Throwable t) { + super(message, t); + } + + public NonRecoverableEventException(Throwable t) { + super(t); + } + +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java new file mode 100644 index 0000000..7c6a723 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java @@ -0,0 +1,208 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.parser; + +import com.google.common.base.Preconditions; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.CacheLoader; +import com.google.common.cache.LoadingCache; +import com.google.common.util.concurrent.UncheckedExecutionException; +import java.io.IOException; +import java.io.InputStream; +import java.net.URI; +import java.net.URL; +import java.util.Locale; +import java.util.Map; +import java.util.concurrent.ExecutionException; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.BinaryDecoder; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.DecoderFactory; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.kite.NonRecoverableEventException; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; + +import static org.apache.flume.sink.kite.DatasetSinkConstants.*; + +/** + * An {@link EntityParser} that parses Avro serialized bytes from an event. + * + * The Avro schema used to serialize the data should be set as either a URL + * or literal in the flume.avro.schema.url or flume.avro.schema.literal event + * headers respectively. + */ +public class AvroParser implements EntityParser { + + static Configuration conf = new Configuration(); + + /** + * A cache of literal schemas to avoid re-parsing the schema. + */ + private static final LoadingCache schemasFromLiteral = + CacheBuilder.newBuilder() + .build(new CacheLoader() { + @Override + public Schema load(String literal) { + Preconditions.checkNotNull(literal, + "Schema literal cannot be null without a Schema URL"); + return new Schema.Parser().parse(literal); + } + }); + + /** + * A cache of schemas retrieved by URL to avoid re-parsing the schema. + */ + private static final LoadingCache schemasFromURL = + CacheBuilder.newBuilder() + .build(new CacheLoader() { + @Override + public Schema load(String url) throws IOException { + Schema.Parser parser = new Schema.Parser(); + InputStream is = null; + try { + FileSystem fs = FileSystem.get(URI.create(url), conf); + if (url.toLowerCase(Locale.ENGLISH).startsWith("hdfs:/")) { + is = fs.open(new Path(url)); + } else { + is = new URL(url).openStream(); + } + return parser.parse(is); + } finally { + if (is != null) { + is.close(); + } + } + } + }); + + /** + * The schema of the destination dataset. + * + * Used as the reader schema during parsing. + */ + private final Schema datasetSchema; + + /** + * A cache of DatumReaders per schema. + */ + private final LoadingCache> readers = + CacheBuilder.newBuilder() + .build(new CacheLoader>() { + @Override + public DatumReader load(Schema schema) { + // must use the target dataset's schema for reading to ensure the + // records are able to be stored using it + return new GenericDatumReader( + schema, datasetSchema); + } + }); + + /** + * The binary decoder to reuse for event parsing. + */ + private BinaryDecoder decoder = null; + + /** + * Create a new AvroParser given the schema of the destination dataset. + * + * @param datasetSchema The schema of the destination dataset. + */ + private AvroParser(Schema datasetSchema) { + this.datasetSchema = datasetSchema; + } + + /** + * Parse the entity from the body of the given event. + * + * @param event The event to parse. + * @param reuse If non-null, this may be reused and returned from this method. + * @return The parsed entity as a GenericRecord. + * @throws EventDeliveryException A recoverable error such as an error + * downloading the schema from the URL has + * occurred. + * @throws NonRecoverableEventException A non-recoverable error such as an + * unparsable schema or entity has + * occurred. + */ + @Override + public GenericRecord parse(Event event, GenericRecord reuse) + throws EventDeliveryException, NonRecoverableEventException { + decoder = DecoderFactory.get().binaryDecoder(event.getBody(), decoder); + + try { + DatumReader reader = readers.getUnchecked(schema(event)); + return reader.read(reuse, decoder); + } catch (IOException ex) { + throw new NonRecoverableEventException("Cannot deserialize event", ex); + } catch (RuntimeException ex) { + throw new NonRecoverableEventException("Cannot deserialize event", ex); + } + } + + /** + * Get the schema from the event headers. + * + * @param event The Flume event + * @return The schema for the event + * @throws EventDeliveryException A recoverable error such as an error + * downloading the schema from the URL has + * occurred. + * @throws NonRecoverableEventException A non-recoverable error such as an + * unparsable schema has occurred. + */ + private static Schema schema(Event event) throws EventDeliveryException, + NonRecoverableEventException { + Map headers = event.getHeaders(); + String schemaURL = headers.get(AVRO_SCHEMA_URL_HEADER); + try { + if (schemaURL != null) { + return schemasFromURL.get(schemaURL); + } else { + String schemaLiteral = headers.get(AVRO_SCHEMA_LITERAL_HEADER); + if (schemaLiteral == null) { + throw new NonRecoverableEventException("No schema in event headers." + + " Headers must include either " + AVRO_SCHEMA_URL_HEADER + + " or " + AVRO_SCHEMA_LITERAL_HEADER); + } + + return schemasFromLiteral.get(schemaLiteral); + } + } catch (ExecutionException ex) { + throw new EventDeliveryException("Cannot get schema", ex.getCause()); + } catch (UncheckedExecutionException ex) { + throw new NonRecoverableEventException("Cannot parse schema", + ex.getCause()); + } + } + + public static class Builder implements EntityParser.Builder { + + @Override + public EntityParser build(Schema datasetSchema, Context config) { + return new AvroParser(datasetSchema); + } + + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java new file mode 100644 index 0000000..f2051a2 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java @@ -0,0 +1,56 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.parser; + +import javax.annotation.concurrent.NotThreadSafe; +import org.apache.avro.Schema; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.kite.NonRecoverableEventException; + +@NotThreadSafe +public interface EntityParser { + + /** + * Parse a Kite entity from a Flume event + * + * @param event The event to parse + * @param reuse If non-null, this may be reused and returned + * @return The parsed entity + * @throws EventDeliveryException A recoverable error during parsing. Parsing + * can be safely retried. + * @throws NonRecoverableEventException A non-recoverable error during + * parsing. The event must be discarded. + * + */ + public E parse(Event event, E reuse) throws EventDeliveryException, + NonRecoverableEventException; + + /** + * Knows how to build {@code EntityParser}s. Implementers must provide a + * no-arg constructor. + * + * @param The type of entities generated + */ + public static interface Builder { + + public EntityParser build(Schema datasetSchema, Context config); + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java new file mode 100644 index 0000000..3720ff3 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.parser; + +import java.util.Arrays; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flume.Context; + +import static org.apache.flume.sink.kite.DatasetSinkConstants.*; + +public class EntityParserFactory { + + public EntityParser newParser(Schema datasetSchema, Context config) { + EntityParser parser; + + String parserType = config.getString(CONFIG_ENTITY_PARSER, + DEFAULT_ENTITY_PARSER); + + if (parserType.equals(AVRO_ENTITY_PARSER)) { + parser = new AvroParser.Builder().build(datasetSchema, config); + } else { + + Class builderClass; + Class c; + try { + c = Class.forName(parserType); + } catch (ClassNotFoundException ex) { + throw new IllegalArgumentException("EntityParser.Builder class " + + parserType + " not found. Must set " + CONFIG_ENTITY_PARSER + + " to a class that implements EntityParser.Builder or to a builtin" + + " parser: " + Arrays.toString(AVAILABLE_PARSERS), ex); + } + + if (c != null && EntityParser.Builder.class.isAssignableFrom(c)) { + builderClass = c; + } else { + throw new IllegalArgumentException("Class " + parserType + " does not" + + " implement EntityParser.Builder. Must set " + + CONFIG_ENTITY_PARSER + " to a class that extends" + + " EntityParser.Builder or to a builtin parser: " + + Arrays.toString(AVAILABLE_PARSERS)); + } + + EntityParser.Builder builder; + try { + builder = builderClass.newInstance(); + } catch (InstantiationException ex) { + throw new IllegalArgumentException("Can't instantiate class " + + parserType + ". Must set " + CONFIG_ENTITY_PARSER + " to a class" + + " that extends EntityParser.Builder or to a builtin parser: " + + Arrays.toString(AVAILABLE_PARSERS), ex); + } catch (IllegalAccessException ex) { + throw new IllegalArgumentException("Can't instantiate class " + + parserType + ". Must set " + CONFIG_ENTITY_PARSER + " to a class" + + " that extends EntityParser.Builder or to a builtin parser: " + + Arrays.toString(AVAILABLE_PARSERS), ex); + } + + parser = builder.build(datasetSchema, config); + } + + return parser; + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java new file mode 100644 index 0000000..f6f875a --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java @@ -0,0 +1,105 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.policy; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.kite.DatasetSink; +import org.kitesdk.data.Syncable; + +/** + * A policy for dealing with non-recoverable event delivery failures. + * + * Non-recoverable event delivery failures include: + * + * 1. Error parsing the event body thrown from the {@link EntityParser} + * 2. A schema mismatch between the schema of an event and the schema of the + * destination dataset. + * 3. A missing schema from the Event header when using the + * {@link AvroEntityParser}. + * + * The life cycle of a FailurePolicy mimics the life cycle of the + * {@link DatasetSink#writer}: + * + * 1. When a new writer is created, the policy will be instantiated. + * 2. As Event failures happen, + * {@link #handle(org.apache.flume.Event, java.lang.Throwable)} will be + * called to let the policy handle the failure. + * 3. If the {@link DatasetSink} is configured to commit on batch, then the + * {@link #sync()} method will be called when the batch is committed. + * 4. When the writer is closed, the policy's {@link #close()} method will be + * called. + */ +public interface FailurePolicy { + + /** + * Handle a non-recoverable event. + * + * @param event The event + * @param cause The cause of the failure + * @throws EventDeliveryException The policy failed to handle the event. When + * this is thrown, the Flume transaction will + * be rolled back and the event will be retried + * along with the rest of the batch. + */ + public void handle(Event event, Throwable cause) + throws EventDeliveryException; + + /** + * Ensure any handled events are on stable storage. + * + * This allows the policy implementation to sync any data that it may not + * have fully handled. + * + * See {@link Syncable#sync()}. + * + * @throws EventDeliveryException The policy failed while syncing data. + * When this is thrown, the Flume transaction + * will be rolled back and the batch will be + * retried. + */ + public void sync() throws EventDeliveryException; + + /** + * Close this FailurePolicy and release any resources. + * + * @throws EventDeliveryException The policy failed while closing resources. + * When this is thrown, the Flume transaction + * will be rolled back and the batch will be + * retried. + */ + public void close() throws EventDeliveryException; + + /** + * Knows how to build {@code FailurePolicy}s. Implementers must provide a + * no-arg constructor. + */ + public static interface Builder { + + /** + * Build a new {@code FailurePolicy} + * + * @param config The Flume configuration context + * @return The {@code FailurePolicy} + */ + FailurePolicy build(Context config); + } + +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java new file mode 100644 index 0000000..d3b1fe8 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.policy; + +import java.util.Arrays; +import org.apache.flume.Context; + +import static org.apache.flume.sink.kite.DatasetSinkConstants.*; + +public class FailurePolicyFactory { + + public FailurePolicy newPolicy(Context config) { + FailurePolicy policy; + + String policyType = config.getString(CONFIG_FAILURE_POLICY, + DEFAULT_FAILURE_POLICY); + + if (policyType.equals(RETRY_FAILURE_POLICY)) { + policy = new RetryPolicy.Builder().build(config); + } else if (policyType.equals(SAVE_FAILURE_POLICY)) { + policy = new SavePolicy.Builder().build(config); + } else { + + Class builderClass; + Class c; + try { + c = Class.forName(policyType); + } catch (ClassNotFoundException ex) { + throw new IllegalArgumentException("FailurePolicy.Builder class " + + policyType + " not found. Must set " + CONFIG_FAILURE_POLICY + + " to a class that implements FailurePolicy.Builder or to a builtin" + + " policy: " + Arrays.toString(AVAILABLE_POLICIES), ex); + } + + if (c != null && FailurePolicy.Builder.class.isAssignableFrom(c)) { + builderClass = c; + } else { + throw new IllegalArgumentException("Class " + policyType + " does not" + + " implement FailurePolicy.Builder. Must set " + + CONFIG_FAILURE_POLICY + " to a class that extends" + + " FailurePolicy.Builder or to a builtin policy: " + + Arrays.toString(AVAILABLE_POLICIES)); + } + + FailurePolicy.Builder builder; + try { + builder = builderClass.newInstance(); + } catch (InstantiationException ex) { + throw new IllegalArgumentException("Can't instantiate class " + + policyType + ". Must set " + CONFIG_FAILURE_POLICY + " to a class" + + " that extends FailurePolicy.Builder or to a builtin policy: " + + Arrays.toString(AVAILABLE_POLICIES), ex); + } catch (IllegalAccessException ex) { + throw new IllegalArgumentException("Can't instantiate class " + + policyType + ". Must set " + CONFIG_FAILURE_POLICY + " to a class" + + " that extends FailurePolicy.Builder or to a builtin policy: " + + Arrays.toString(AVAILABLE_POLICIES), ex); + } + + policy = builder.build(config); + } + + return policy; + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java new file mode 100644 index 0000000..9a4991c --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java @@ -0,0 +1,63 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.policy; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A failure policy that logs the error and then forces a retry by throwing + * {@link EventDeliveryException}. + */ +public class RetryPolicy implements FailurePolicy { + private static final Logger LOG = LoggerFactory.getLogger(RetryPolicy.class); + + private RetryPolicy() { + } + + @Override + public void handle(Event event, Throwable cause) throws EventDeliveryException { + LOG.error("Event delivery failed: " + cause.getLocalizedMessage()); + LOG.debug("Exception follows.", cause); + + throw new EventDeliveryException(cause); + } + + @Override + public void sync() throws EventDeliveryException { + // do nothing + } + + @Override + public void close() throws EventDeliveryException { + // do nothing + } + + public static class Builder implements FailurePolicy.Builder { + + @Override + public FailurePolicy build(Context config) { + return new RetryPolicy(); + } + + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java new file mode 100644 index 0000000..bd537ec --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java @@ -0,0 +1,128 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite.policy; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Maps; +import java.nio.ByteBuffer; +import java.util.Map; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.source.avro.AvroFlumeEvent; +import org.kitesdk.data.DatasetDescriptor; +import org.kitesdk.data.DatasetWriter; +import org.kitesdk.data.Datasets; +import org.kitesdk.data.Formats; +import org.kitesdk.data.Syncable; +import org.kitesdk.data.View; + +import static org.apache.flume.sink.kite.DatasetSinkConstants.*; + +/** + * A failure policy that writes the raw Flume event to a Kite dataset. + */ +public class SavePolicy implements FailurePolicy { + + private final View dataset; + private DatasetWriter writer; + private int nEventsHandled; + + private SavePolicy(Context context) { + String uri = context.getString(CONFIG_KITE_ERROR_DATASET_URI); + Preconditions.checkArgument(uri != null, "Must set " + + CONFIG_KITE_ERROR_DATASET_URI + " when " + CONFIG_FAILURE_POLICY + + "=save"); + if (Datasets.exists(uri)) { + dataset = Datasets.load(uri, AvroFlumeEvent.class); + } else { + DatasetDescriptor descriptor = new DatasetDescriptor.Builder() + .schema(AvroFlumeEvent.class) + .build(); + dataset = Datasets.create(uri, descriptor, AvroFlumeEvent.class); + } + + nEventsHandled = 0; + } + + @Override + public void handle(Event event, Throwable cause) throws EventDeliveryException { + try { + if (writer == null) { + writer = dataset.newWriter(); + } + + final AvroFlumeEvent avroEvent = new AvroFlumeEvent(); + avroEvent.setBody(ByteBuffer.wrap(event.getBody())); + avroEvent.setHeaders(toCharSeqMap(event.getHeaders())); + + writer.write(avroEvent); + nEventsHandled++; + } catch (RuntimeException ex) { + throw new EventDeliveryException(ex); + } + } + + @Override + public void sync() throws EventDeliveryException { + if (nEventsHandled > 0) { + if (Formats.PARQUET.equals( + dataset.getDataset().getDescriptor().getFormat())) { + // We need to close the writer on sync if we're writing to a Parquet + // dataset + close(); + } else { + if (writer instanceof Syncable) { + ((Syncable) writer).sync(); + } + } + } + } + + @Override + public void close() throws EventDeliveryException { + if (nEventsHandled > 0) { + try { + writer.close(); + } catch (RuntimeException ex) { + throw new EventDeliveryException(ex); + } finally { + writer = null; + nEventsHandled = 0; + } + } + } + + /** + * Helper function to convert a map of String to a map of CharSequence. + */ + private static Map toCharSeqMap( + Map map) { + return Maps.newHashMap(map); + } + + public static class Builder implements FailurePolicy.Builder { + + @Override + public FailurePolicy build(Context config) { + return new SavePolicy(config); + } + + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java b/code/flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java new file mode 100644 index 0000000..3709577 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java @@ -0,0 +1,1036 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.kite; + +import com.google.common.base.Function; +import com.google.common.base.Throwables; +import com.google.common.collect.Iterables; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; +import org.apache.avro.Schema; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.io.Encoder; +import org.apache.avro.io.EncoderFactory; +import org.apache.avro.reflect.ReflectDatumWriter; +import org.apache.avro.util.Utf8; +import org.apache.commons.io.FileUtils; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.conf.Configurables; +import org.apache.flume.event.SimpleEvent; +import org.apache.flume.sink.kite.parser.EntityParser; +import org.apache.flume.sink.kite.policy.FailurePolicy; +import org.apache.flume.source.avro.AvroFlumeEvent; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.kitesdk.data.Dataset; +import org.kitesdk.data.DatasetDescriptor; +import org.kitesdk.data.DatasetReader; +import org.kitesdk.data.DatasetWriter; +import org.kitesdk.data.Datasets; +import org.kitesdk.data.PartitionStrategy; +import org.kitesdk.data.View; + +import javax.annotation.Nullable; +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.io.FileWriter; +import java.io.IOException; +import java.net.URI; +import java.nio.ByteBuffer; +import java.util.Arrays; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.Callable; + +import static org.mockito.Mockito.any; +import static org.mockito.Mockito.doThrow; +import static org.mockito.Mockito.eq; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.never; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +public class TestDatasetSink { + + public static final String FILE_REPO_URI = "repo:file:target/test_repo"; + public static final String DATASET_NAME = "test"; + public static final String FILE_DATASET_URI = + "dataset:file:target/test_repo/" + DATASET_NAME; + public static final String ERROR_DATASET_URI = + "dataset:file:target/test_repo/failed_events"; + public static final File SCHEMA_FILE = new File("target/record-schema.avsc"); + public static final Schema RECORD_SCHEMA = new Schema.Parser().parse( + "{\"type\":\"record\",\"name\":\"rec\",\"fields\":[" + + "{\"name\":\"id\",\"type\":\"string\"}," + + "{\"name\":\"msg\",\"type\":[\"string\",\"null\"]," + + "\"default\":\"default\"}]}"); + public static final Schema COMPATIBLE_SCHEMA = new Schema.Parser().parse( + "{\"type\":\"record\",\"name\":\"rec\",\"fields\":[" + + "{\"name\":\"id\",\"type\":\"string\"}]}"); + public static final Schema INCOMPATIBLE_SCHEMA = new Schema.Parser().parse( + "{\"type\":\"record\",\"name\":\"user\",\"fields\":[" + + "{\"name\":\"username\",\"type\":\"string\"}]}"); + public static final Schema UPDATED_SCHEMA = new Schema.Parser().parse( + "{\"type\":\"record\",\"name\":\"rec\",\"fields\":[" + + "{\"name\":\"id\",\"type\":\"string\"}," + + "{\"name\":\"priority\",\"type\":\"int\", \"default\": 0}," + + "{\"name\":\"msg\",\"type\":[\"string\",\"null\"]," + + "\"default\":\"default\"}]}"); + public static final DatasetDescriptor DESCRIPTOR = new DatasetDescriptor + .Builder() + .schema(RECORD_SCHEMA) + .build(); + + Context config = null; + Channel in = null; + List expected = null; + private static final String DFS_DIR = "target/test/dfs"; + private static final String TEST_BUILD_DATA_KEY = "test.build.data"; + private static String oldTestBuildDataProp = null; + + @BeforeClass + public static void saveSchema() throws IOException { + oldTestBuildDataProp = System.getProperty(TEST_BUILD_DATA_KEY); + System.setProperty(TEST_BUILD_DATA_KEY, DFS_DIR); + FileWriter schema = new FileWriter(SCHEMA_FILE); + schema.append(RECORD_SCHEMA.toString()); + schema.close(); + } + + @AfterClass + public static void tearDownClass() { + FileUtils.deleteQuietly(new File(DFS_DIR)); + if (oldTestBuildDataProp != null) { + System.setProperty(TEST_BUILD_DATA_KEY, oldTestBuildDataProp); + } + } + + @Before + public void setup() throws EventDeliveryException { + Datasets.delete(FILE_DATASET_URI); + Datasets.create(FILE_DATASET_URI, DESCRIPTOR); + + this.config = new Context(); + config.put("keep-alive", "0"); + this.in = new MemoryChannel(); + Configurables.configure(in, config); + + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_URI, FILE_DATASET_URI); + + GenericRecordBuilder builder = new GenericRecordBuilder(RECORD_SCHEMA); + expected = Lists.newArrayList( + builder.set("id", "1").set("msg", "msg1").build(), + builder.set("id", "2").set("msg", "msg2").build(), + builder.set("id", "3").set("msg", "msg3").build()); + + putToChannel(in, Iterables.transform(expected, + new Function() { + private int i = 0; + + @Override + public Event apply(@Nullable GenericRecord rec) { + this.i += 1; + boolean useURI = (i % 2) == 0; + return event(rec, RECORD_SCHEMA, SCHEMA_FILE, useURI); + } + })); + } + + @After + public void teardown() { + Datasets.delete(FILE_DATASET_URI); + } + + @Test + public void testOldConfig() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_URI, null); + config.put(DatasetSinkConstants.CONFIG_KITE_REPO_URI, FILE_REPO_URI); + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_NAME, DATASET_NAME); + + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testDatasetUriOverridesOldConfig() throws EventDeliveryException { + // CONFIG_KITE_DATASET_URI is still set, otherwise this will cause an error + config.put(DatasetSinkConstants.CONFIG_KITE_REPO_URI, "bad uri"); + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_NAME, ""); + + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testFileStore() + throws EventDeliveryException, NonRecoverableEventException, NonRecoverableEventException { + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testParquetDataset() throws EventDeliveryException { + Datasets.delete(FILE_DATASET_URI); + Dataset created = Datasets.create(FILE_DATASET_URI, + new DatasetDescriptor.Builder(DESCRIPTOR) + .format("parquet") + .build()); + + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // the transaction should not commit during the call to process + assertThrows("Transaction should still be open", IllegalStateException.class, + new Callable() { + @Override + public Object call() throws EventDeliveryException { + in.getTransaction().begin(); + return null; + } + }); + // The records won't commit until the call to stop() + Assert.assertEquals("Should not have committed", 0, read(created).size()); + + sink.stop(); + + Assert.assertEquals(Sets.newHashSet(expected), read(created)); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testPartitionedData() throws EventDeliveryException { + URI partitionedUri = URI.create("dataset:file:target/test_repo/partitioned"); + try { + Datasets.create(partitionedUri, new DatasetDescriptor.Builder(DESCRIPTOR) + .partitionStrategy(new PartitionStrategy.Builder() + .identity("id", 10) // partition by id + .build()) + .build()); + + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_URI, + partitionedUri.toString()); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(partitionedUri))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } finally { + if (Datasets.exists(partitionedUri)) { + Datasets.delete(partitionedUri); + } + } + } + + @Test + public void testStartBeforeDatasetCreated() throws EventDeliveryException { + // delete the dataset created by setup + Datasets.delete(FILE_DATASET_URI); + + DatasetSink sink = sink(in, config); + + // start the sink + sink.start(); + + // run the sink without a target dataset + try { + sink.process(); + Assert.fail("Should have thrown an exception: no such dataset"); + } catch (EventDeliveryException e) { + // expected + } + + // create the target dataset + Datasets.create(FILE_DATASET_URI, DESCRIPTOR); + + // run the sink + sink.process(); + sink.stop(); + + Assert.assertEquals(Sets.newHashSet(expected), read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testDatasetUpdate() throws EventDeliveryException { + // add an updated record that is missing the msg field + GenericRecordBuilder updatedBuilder = new GenericRecordBuilder(UPDATED_SCHEMA); + GenericData.Record updatedRecord = updatedBuilder + .set("id", "0") + .set("priority", 1) + .set("msg", "Priority 1 message!") + .build(); + + // make a set of the expected records with the new schema + Set expectedAsUpdated = Sets.newHashSet(); + for (GenericRecord record : expected) { + expectedAsUpdated.add(updatedBuilder + .clear("priority") + .set("id", record.get("id")) + .set("msg", record.get("msg")) + .build()); + } + expectedAsUpdated.add(updatedRecord); + + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // update the dataset's schema + DatasetDescriptor updated = new DatasetDescriptor + .Builder(Datasets.load(FILE_DATASET_URI).getDataset().getDescriptor()) + .schema(UPDATED_SCHEMA) + .build(); + Datasets.update(FILE_DATASET_URI, updated); + + // trigger a roll on the next process call to refresh the writer + sink.roll(); + + // add the record to the incoming channel and the expected list + putToChannel(in, event(updatedRecord, UPDATED_SCHEMA, null, false)); + + // process events with the updated schema + sink.process(); + sink.stop(); + + Assert.assertEquals(expectedAsUpdated, read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testMiniClusterStore() throws EventDeliveryException, IOException { + // setup a minicluster + MiniDFSCluster cluster = new MiniDFSCluster + .Builder(new Configuration()) + .build(); + + FileSystem dfs = cluster.getFileSystem(); + Configuration conf = dfs.getConf(); + + URI hdfsUri = URI.create( + "dataset:" + conf.get("fs.defaultFS") + "/tmp/repo" + DATASET_NAME); + try { + // create a repository and dataset in HDFS + Datasets.create(hdfsUri, DESCRIPTOR); + + // update the config to use the HDFS repository + config.put(DatasetSinkConstants.CONFIG_KITE_DATASET_URI, hdfsUri.toString()); + + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(hdfsUri))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + + } finally { + if (Datasets.exists(hdfsUri)) { + Datasets.delete(hdfsUri); + } + cluster.shutdown(); + } + } + + @Test + public void testBatchSize() throws EventDeliveryException { + DatasetSink sink = sink(in, config); + + // release one record per process call + config.put("kite.batchSize", "2"); + Configurables.configure(sink, config); + + sink.start(); + sink.process(); // process the first and second + sink.roll(); // roll at the next process call + sink.process(); // roll and process the third + Assert.assertEquals( + Sets.newHashSet(expected.subList(0, 2)), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + sink.roll(); // roll at the next process call + sink.process(); // roll, the channel is empty + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + sink.stop(); + } + + @Test + public void testTimedFileRolling() + throws EventDeliveryException, InterruptedException { + // use a new roll interval + config.put("kite.rollInterval", "1"); // in seconds + + DatasetSink sink = sink(in, config); + + Dataset records = Datasets.load(FILE_DATASET_URI); + + // run the sink + sink.start(); + sink.process(); + + Assert.assertEquals("Should have committed", 0, remaining(in)); + + Thread.sleep(1100); // sleep longer than the roll interval + sink.process(); // rolling happens in the process method + + Assert.assertEquals(Sets.newHashSet(expected), read(records)); + + // wait until the end to stop because it would close the files + sink.stop(); + } + + @Test + public void testCompatibleSchemas() throws EventDeliveryException { + DatasetSink sink = sink(in, config); + + // add a compatible record that is missing the msg field + GenericRecordBuilder compatBuilder = new GenericRecordBuilder( + COMPATIBLE_SCHEMA); + GenericData.Record compatibleRecord = compatBuilder.set("id", "0").build(); + + // add the record to the incoming channel + putToChannel(in, event(compatibleRecord, COMPATIBLE_SCHEMA, null, false)); + + // the record will be read using the real schema, so create the expected + // record using it, but without any data + + GenericRecordBuilder builder = new GenericRecordBuilder(RECORD_SCHEMA); + GenericData.Record expectedRecord = builder.set("id", "0").build(); + expected.add(expectedRecord); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testIncompatibleSchemas() throws EventDeliveryException { + final DatasetSink sink = sink(in, config); + + GenericRecordBuilder builder = new GenericRecordBuilder( + INCOMPATIBLE_SCHEMA); + GenericData.Record rec = builder.set("username", "koala").build(); + putToChannel(in, event(rec, INCOMPATIBLE_SCHEMA, null, false)); + + // run the sink + sink.start(); + assertThrows("Should fail", EventDeliveryException.class, + new Callable() { + @Override + public Object call() throws EventDeliveryException { + sink.process(); + return null; + } + }); + sink.stop(); + + Assert.assertEquals("Should have rolled back", + expected.size() + 1, remaining(in)); + } + + @Test + public void testMissingSchema() throws EventDeliveryException { + final DatasetSink sink = sink(in, config); + + Event badEvent = new SimpleEvent(); + badEvent.setHeaders(Maps.newHashMap()); + badEvent.setBody(serialize(expected.get(0), RECORD_SCHEMA)); + putToChannel(in, badEvent); + + // run the sink + sink.start(); + assertThrows("Should fail", EventDeliveryException.class, + new Callable() { + @Override + public Object call() throws EventDeliveryException { + sink.process(); + return null; + } + }); + sink.stop(); + + Assert.assertEquals("Should have rolled back", + expected.size() + 1, remaining(in)); + } + + @Test + public void testFileStoreWithSavePolicy() throws EventDeliveryException { + if (Datasets.exists(ERROR_DATASET_URI)) { + Datasets.delete(ERROR_DATASET_URI); + } + config.put(DatasetSinkConstants.CONFIG_FAILURE_POLICY, + DatasetSinkConstants.SAVE_FAILURE_POLICY); + config.put(DatasetSinkConstants.CONFIG_KITE_ERROR_DATASET_URI, + ERROR_DATASET_URI); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testMissingSchemaWithSavePolicy() throws EventDeliveryException { + if (Datasets.exists(ERROR_DATASET_URI)) { + Datasets.delete(ERROR_DATASET_URI); + } + config.put(DatasetSinkConstants.CONFIG_FAILURE_POLICY, + DatasetSinkConstants.SAVE_FAILURE_POLICY); + config.put(DatasetSinkConstants.CONFIG_KITE_ERROR_DATASET_URI, + ERROR_DATASET_URI); + final DatasetSink sink = sink(in, config); + + Event badEvent = new SimpleEvent(); + badEvent.setHeaders(Maps.newHashMap()); + badEvent.setBody(serialize(expected.get(0), RECORD_SCHEMA)); + putToChannel(in, badEvent); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals("Good records should have been written", + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should not have rolled back", 0, remaining(in)); + Assert.assertEquals("Should have saved the bad event", + Sets.newHashSet(AvroFlumeEvent.newBuilder() + .setBody(ByteBuffer.wrap(badEvent.getBody())) + .setHeaders(toUtf8Map(badEvent.getHeaders())) + .build()), + read(Datasets.load(ERROR_DATASET_URI, AvroFlumeEvent.class))); + } + + @Test + public void testSerializedWithIncompatibleSchemasWithSavePolicy() + throws EventDeliveryException { + if (Datasets.exists(ERROR_DATASET_URI)) { + Datasets.delete(ERROR_DATASET_URI); + } + config.put(DatasetSinkConstants.CONFIG_FAILURE_POLICY, + DatasetSinkConstants.SAVE_FAILURE_POLICY); + config.put(DatasetSinkConstants.CONFIG_KITE_ERROR_DATASET_URI, + ERROR_DATASET_URI); + final DatasetSink sink = sink(in, config); + + GenericRecordBuilder builder = new GenericRecordBuilder( + INCOMPATIBLE_SCHEMA); + GenericData.Record rec = builder.set("username", "koala").build(); + + // We pass in a valid schema in the header, but an incompatible schema + // was used to serialize the record + Event badEvent = event(rec, INCOMPATIBLE_SCHEMA, SCHEMA_FILE, true); + putToChannel(in, badEvent); + + // run the sink + sink.start(); + sink.process(); + sink.stop(); + + Assert.assertEquals("Good records should have been written", + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + Assert.assertEquals("Should not have rolled back", 0, remaining(in)); + Assert.assertEquals("Should have saved the bad event", + Sets.newHashSet(AvroFlumeEvent.newBuilder() + .setBody(ByteBuffer.wrap(badEvent.getBody())) + .setHeaders(toUtf8Map(badEvent.getHeaders())) + .build()), + read(Datasets.load(ERROR_DATASET_URI, AvroFlumeEvent.class))); + } + + @Test + public void testSerializedWithIncompatibleSchemas() throws EventDeliveryException { + final DatasetSink sink = sink(in, config); + + GenericRecordBuilder builder = new GenericRecordBuilder( + INCOMPATIBLE_SCHEMA); + GenericData.Record rec = builder.set("username", "koala").build(); + + // We pass in a valid schema in the header, but an incompatible schema + // was used to serialize the record + putToChannel(in, event(rec, INCOMPATIBLE_SCHEMA, SCHEMA_FILE, true)); + + // run the sink + sink.start(); + assertThrows("Should fail", EventDeliveryException.class, + new Callable() { + @Override + public Object call() throws EventDeliveryException { + sink.process(); + return null; + } + }); + sink.stop(); + + Assert.assertEquals("Should have rolled back", + expected.size() + 1, remaining(in)); + } + + @Test + public void testCommitOnBatch() throws EventDeliveryException { + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // the transaction should commit during the call to process + Assert.assertEquals("Should have committed", 0, remaining(in)); + // but the data won't be visible yet + Assert.assertEquals(0, + read(Datasets.load(FILE_DATASET_URI)).size()); + + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + } + + @Test + public void testCommitOnBatchFalse() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + Boolean.toString(false)); + config.put(DatasetSinkConstants.CONFIG_SYNCABLE_SYNC_ON_BATCH, + Boolean.toString(false)); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // the transaction should not commit during the call to process + assertThrows("Transaction should still be open", IllegalStateException.class, + new Callable() { + @Override + public Object call() throws EventDeliveryException { + in.getTransaction().begin(); + return null; + } + }); + + // the data won't be visible + Assert.assertEquals(0, + read(Datasets.load(FILE_DATASET_URI)).size()); + + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + // the transaction should commit during the call to stop + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + @Test + public void testCommitOnBatchFalseSyncOnBatchTrue() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + Boolean.toString(false)); + config.put(DatasetSinkConstants.CONFIG_SYNCABLE_SYNC_ON_BATCH, + Boolean.toString(true)); + + try { + sink(in, config); + Assert.fail("Should have thrown IllegalArgumentException"); + } catch (IllegalArgumentException ex) { + // expected + } + } + + @Test + public void testCloseAndCreateWriter() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + Boolean.toString(false)); + config.put(DatasetSinkConstants.CONFIG_SYNCABLE_SYNC_ON_BATCH, + Boolean.toString(false)); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + sink.closeWriter(); + sink.commitTransaction(); + sink.createWriter(); + + Assert.assertNotNull("Writer should not be null", sink.getWriter()); + Assert.assertEquals("Should have committed", 0, remaining(in)); + + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + } + + @Test + public void testCloseWriter() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + Boolean.toString(false)); + config.put(DatasetSinkConstants.CONFIG_SYNCABLE_SYNC_ON_BATCH, + Boolean.toString(false)); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + sink.closeWriter(); + sink.commitTransaction(); + + Assert.assertNull("Writer should be null", sink.getWriter()); + Assert.assertEquals("Should have committed", 0, remaining(in)); + + sink.stop(); + + Assert.assertEquals( + Sets.newHashSet(expected), + read(Datasets.load(FILE_DATASET_URI))); + } + + @Test + public void testCreateWriter() throws EventDeliveryException { + config.put(DatasetSinkConstants.CONFIG_FLUSHABLE_COMMIT_ON_BATCH, + Boolean.toString(false)); + config.put(DatasetSinkConstants.CONFIG_SYNCABLE_SYNC_ON_BATCH, + Boolean.toString(false)); + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + sink.commitTransaction(); + sink.createWriter(); + Assert.assertNotNull("Writer should not be null", sink.getWriter()); + Assert.assertEquals("Should have committed", 0, remaining(in)); + + sink.stop(); + + Assert.assertEquals(0, read(Datasets.load(FILE_DATASET_URI)).size()); + } + + @Test + public void testAppendWriteExceptionInvokesPolicy() + throws EventDeliveryException, NonRecoverableEventException { + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // Mock an Event + Event mockEvent = mock(Event.class); + when(mockEvent.getBody()).thenReturn(new byte[] { 0x01 }); + + // Mock a GenericRecord + GenericRecord mockRecord = mock(GenericRecord.class); + + // Mock an EntityParser + EntityParser mockParser = mock(EntityParser.class); + when(mockParser.parse(eq(mockEvent), any(GenericRecord.class))) + .thenReturn(mockRecord); + sink.setParser(mockParser); + + // Mock a FailurePolicy + FailurePolicy mockFailurePolicy = mock(FailurePolicy.class); + sink.setFailurePolicy(mockFailurePolicy); + + // Mock a DatasetWriter + DatasetWriter mockWriter = mock(DatasetWriter.class); + doThrow(new DataFileWriter.AppendWriteException(new IOException())) + .when(mockWriter).write(mockRecord); + + sink.setWriter(mockWriter); + sink.write(mockEvent); + + // Verify that the event was sent to the failure policy + verify(mockFailurePolicy).handle(eq(mockEvent), any(Throwable.class)); + + sink.stop(); + } + + @Test + public void testRuntimeExceptionThrowsEventDeliveryException() + throws EventDeliveryException, NonRecoverableEventException { + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // Mock an Event + Event mockEvent = mock(Event.class); + when(mockEvent.getBody()).thenReturn(new byte[] { 0x01 }); + + // Mock a GenericRecord + GenericRecord mockRecord = mock(GenericRecord.class); + + // Mock an EntityParser + EntityParser mockParser = mock(EntityParser.class); + when(mockParser.parse(eq(mockEvent), any(GenericRecord.class))) + .thenReturn(mockRecord); + sink.setParser(mockParser); + + // Mock a FailurePolicy + FailurePolicy mockFailurePolicy = mock(FailurePolicy.class); + sink.setFailurePolicy(mockFailurePolicy); + + // Mock a DatasetWriter + DatasetWriter mockWriter = mock(DatasetWriter.class); + doThrow(new RuntimeException()).when(mockWriter).write(mockRecord); + + sink.setWriter(mockWriter); + + try { + sink.write(mockEvent); + Assert.fail("Should throw EventDeliveryException"); + } catch (EventDeliveryException ex) { + + } + + // Verify that the event was not sent to the failure policy + verify(mockFailurePolicy, never()).handle(eq(mockEvent), any(Throwable.class)); + + sink.stop(); + } + + @Test + public void testProcessHandlesNullWriter() throws EventDeliveryException, + NonRecoverableEventException, NonRecoverableEventException { + DatasetSink sink = sink(in, config); + + // run the sink + sink.start(); + sink.process(); + + // explicitly set the writer to null + sink.setWriter(null); + + // this should not throw an NPE + sink.process(); + + sink.stop(); + + Assert.assertEquals("Should have committed", 0, remaining(in)); + } + + public static DatasetSink sink(Channel in, Context config) { + DatasetSink sink = new DatasetSink(); + sink.setChannel(in); + Configurables.configure(sink, config); + return sink; + } + + public static HashSet read(View view) { + DatasetReader reader = null; + try { + reader = view.newReader(); + return Sets.newHashSet(reader.iterator()); + } finally { + if (reader != null) { + reader.close(); + } + } + } + + public static int remaining(Channel ch) throws EventDeliveryException { + Transaction t = ch.getTransaction(); + try { + t.begin(); + int count = 0; + while (ch.take() != null) { + count += 1; + } + t.commit(); + return count; + } catch (Throwable th) { + t.rollback(); + Throwables.propagateIfInstanceOf(th, Error.class); + Throwables.propagateIfInstanceOf(th, EventDeliveryException.class); + throw new EventDeliveryException(th); + } finally { + t.close(); + } + } + + public static void putToChannel(Channel in, Event... records) + throws EventDeliveryException { + putToChannel(in, Arrays.asList(records)); + } + + public static void putToChannel(Channel in, Iterable records) + throws EventDeliveryException { + Transaction t = in.getTransaction(); + try { + t.begin(); + for (Event record : records) { + in.put(record); + } + t.commit(); + } catch (Throwable th) { + t.rollback(); + Throwables.propagateIfInstanceOf(th, Error.class); + Throwables.propagateIfInstanceOf(th, EventDeliveryException.class); + throw new EventDeliveryException(th); + } finally { + t.close(); + } + } + + public static Event event( + Object datum, Schema schema, File file, boolean useURI) { + Map headers = Maps.newHashMap(); + if (useURI) { + headers.put(DatasetSinkConstants.AVRO_SCHEMA_URL_HEADER, + file.getAbsoluteFile().toURI().toString()); + } else { + headers.put(DatasetSinkConstants.AVRO_SCHEMA_LITERAL_HEADER, + schema.toString()); + } + Event e = new SimpleEvent(); + e.setBody(serialize(datum, schema)); + e.setHeaders(headers); + return e; + } + + @SuppressWarnings("unchecked") + public static byte[] serialize(Object datum, Schema schema) { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + Encoder encoder = EncoderFactory.get().binaryEncoder(out, null); + ReflectDatumWriter writer = new ReflectDatumWriter(schema); + try { + writer.write(datum, encoder); + encoder.flush(); + } catch (IOException ex) { + Throwables.propagate(ex); + } + return out.toByteArray(); + } + + /** + * A convenience method to avoid a large number of @Test(expected=...) tests. + * + * This variant uses a Callable, which is allowed to throw checked Exceptions. + * + * @param message A String message to describe this assertion + * @param expected An Exception class that the Runnable should throw + * @param callable A Callable that is expected to throw the exception + */ + public static void assertThrows( + String message, Class expected, Callable callable) { + try { + callable.call(); + Assert.fail("No exception was thrown (" + message + "), expected: " + + expected.getName()); + } catch (Exception actual) { + Assert.assertEquals(message, expected, actual.getClass()); + } + } + + /** + * Helper function to convert a map of String to a map of Utf8. + * + * @param map A Map of String to String + * @return The same mappings converting the {@code String}s to {@link Utf8}s + */ + public static Map toUtf8Map( + Map map) { + Map utf8Map = Maps.newHashMap(); + for (Map.Entry entry : map.entrySet()) { + utf8Map.put(new Utf8(entry.getKey()), new Utf8(entry.getValue())); + } + return utf8Map; + } +} diff --git a/code/flume-ng-sinks/flume-dataset-sink/src/test/resources/enable-kerberos.xml b/code/flume-ng-sinks/flume-dataset-sink/src/test/resources/enable-kerberos.xml new file mode 100644 index 0000000..85b0447 --- /dev/null +++ b/code/flume-ng-sinks/flume-dataset-sink/src/test/resources/enable-kerberos.xml @@ -0,0 +1,30 @@ + + + + + + + hadoop.security.authentication + kerberos + + + + hadoop.security.authorization + true + + + diff --git a/code/flume-ng-sinks/flume-hdfs-sink/pom.xml b/code/flume-ng-sinks/flume-hdfs-sink/pom.xml new file mode 100644 index 0000000..bcf6556 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/pom.xml @@ -0,0 +1,196 @@ + + + + + 4.0.0 + + + flume-ng-sinks + org.apache.flume + 1.7.0 + + + org.apache.flume.flume-ng-sinks + flume-hdfs-sink + Flume NG HDFS Sink + + + + + org.apache.rat + apache-rat-plugin + + + + + + + + org.apache.flume + flume-ng-sdk + + + + org.apache.flume + flume-ng-configuration + + + + org.apache.flume + flume-ng-core + + + + org.slf4j + slf4j-api + + + + com.google.guava + guava + + + + junit + junit + test + + + + org.slf4j + slf4j-log4j12 + test + + + + org.mockito + mockito-all + test + + + + org.apache.hadoop + ${hadoop.common.artifact.id} + true + + + + commons-lang + commons-lang + + + + commons-io + commons-io + + + + + + + + hadoop-1.0 + + + flume.hadoop.profile + 1 + + + + + + org.apache.hadoop + hadoop-test + test + + + + + com.sun.jersey + jersey-core + test + + + + + + + hadoop-2 + + + flume.hadoop.profile + 2 + + + + + + org.apache.hadoop + hadoop-hdfs + true + + + + org.apache.hadoop + hadoop-auth + true + + + + org.apache.hadoop + hadoop-minicluster + test + + + + + + + hbase-1 + + + !flume.hadoop.profile + + + + + + org.apache.hadoop + hadoop-hdfs + true + + + + org.apache.hadoop + hadoop-auth + true + + + + org.apache.hadoop + hadoop-minicluster + test + + + + + + + diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java new file mode 100644 index 0000000..2fe309f --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AbstractHDFSWriter.java @@ -0,0 +1,280 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.hdfs; + +import com.google.common.base.Preconditions; +import org.apache.flume.Context; +import org.apache.flume.FlumeException; +import org.apache.flume.annotations.InterfaceAudience; +import org.apache.flume.annotations.InterfaceStability; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.OutputStream; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; + +@InterfaceAudience.Private +@InterfaceStability.Evolving +public abstract class AbstractHDFSWriter implements HDFSWriter { + + private static final Logger logger = + LoggerFactory.getLogger(AbstractHDFSWriter.class); + + private FSDataOutputStream outputStream; + private FileSystem fs; + private Path destPath; + private Method refGetNumCurrentReplicas = null; + private Method refGetDefaultReplication = null; + private Method refHflushOrSync = null; + private Integer configuredMinReplicas = null; + private Integer numberOfCloseRetries = null; + private long timeBetweenCloseRetries = Long.MAX_VALUE; + + static final Object[] NO_ARGS = new Object[]{}; + + @Override + public void configure(Context context) { + configuredMinReplicas = context.getInteger("hdfs.minBlockReplicas"); + if (configuredMinReplicas != null) { + Preconditions.checkArgument(configuredMinReplicas >= 0, + "hdfs.minBlockReplicas must be greater than or equal to 0"); + } + numberOfCloseRetries = context.getInteger("hdfs.closeTries", 1) - 1; + + if (numberOfCloseRetries > 1) { + try { + timeBetweenCloseRetries = context.getLong("hdfs.callTimeout", 10000L); + } catch (NumberFormatException e) { + logger.warn("hdfs.callTimeout can not be parsed to a long: " + + context.getLong("hdfs.callTimeout")); + } + timeBetweenCloseRetries = Math.max(timeBetweenCloseRetries / numberOfCloseRetries, 1000); + } + + } + + /** + * Contract for subclasses: Call registerCurrentStream() on open, + * unregisterCurrentStream() on close, and the base class takes care of the + * rest. + * @return + */ + @Override + public boolean isUnderReplicated() { + try { + int numBlocks = getNumCurrentReplicas(); + if (numBlocks == -1) { + return false; + } + int desiredBlocks; + if (configuredMinReplicas != null) { + desiredBlocks = configuredMinReplicas; + } else { + desiredBlocks = getFsDesiredReplication(); + } + return numBlocks < desiredBlocks; + } catch (IllegalAccessException e) { + logger.error("Unexpected error while checking replication factor", e); + } catch (InvocationTargetException e) { + logger.error("Unexpected error while checking replication factor", e); + } catch (IllegalArgumentException e) { + logger.error("Unexpected error while checking replication factor", e); + } + return false; + } + + protected void registerCurrentStream(FSDataOutputStream outputStream, + FileSystem fs, Path destPath) { + Preconditions.checkNotNull(outputStream, "outputStream must not be null"); + Preconditions.checkNotNull(fs, "fs must not be null"); + Preconditions.checkNotNull(destPath, "destPath must not be null"); + + this.outputStream = outputStream; + this.fs = fs; + this.destPath = destPath; + this.refGetNumCurrentReplicas = reflectGetNumCurrentReplicas(outputStream); + this.refGetDefaultReplication = reflectGetDefaultReplication(fs); + this.refHflushOrSync = reflectHflushOrSync(outputStream); + + } + + protected void unregisterCurrentStream() { + this.outputStream = null; + this.fs = null; + this.destPath = null; + this.refGetNumCurrentReplicas = null; + this.refGetDefaultReplication = null; + } + + public int getFsDesiredReplication() { + short replication = 0; + if (fs != null && destPath != null) { + if (refGetDefaultReplication != null) { + try { + replication = (Short) refGetDefaultReplication.invoke(fs, destPath); + } catch (IllegalAccessException e) { + logger.warn("Unexpected error calling getDefaultReplication(Path)", e); + } catch (InvocationTargetException e) { + logger.warn("Unexpected error calling getDefaultReplication(Path)", e); + } + } else { + // will not work on Federated HDFS (see HADOOP-8014) + replication = fs.getDefaultReplication(); + } + } + return replication; + } + + /** + * This method gets the datanode replication count for the current open file. + * + * If the pipeline isn't started yet or is empty, you will get the default + * replication factor. + * + *

If this function returns -1, it means you + * are not properly running with the HDFS-826 patch. + * @throws InvocationTargetException + * @throws IllegalAccessException + * @throws IllegalArgumentException + */ + public int getNumCurrentReplicas() + throws IllegalArgumentException, IllegalAccessException, + InvocationTargetException { + if (refGetNumCurrentReplicas != null && outputStream != null) { + OutputStream dfsOutputStream = outputStream.getWrappedStream(); + if (dfsOutputStream != null) { + Object repl = refGetNumCurrentReplicas.invoke(dfsOutputStream, NO_ARGS); + if (repl instanceof Integer) { + return ((Integer)repl).intValue(); + } + } + } + return -1; + } + + /** + * Find the 'getNumCurrentReplicas' on the passed os stream. + * @return Method or null. + */ + private Method reflectGetNumCurrentReplicas(FSDataOutputStream os) { + Method m = null; + if (os != null) { + Class wrappedStreamClass = os.getWrappedStream() + .getClass(); + try { + m = wrappedStreamClass.getDeclaredMethod("getNumCurrentReplicas", + new Class[] {}); + m.setAccessible(true); + } catch (NoSuchMethodException e) { + logger.info("FileSystem's output stream doesn't support" + + " getNumCurrentReplicas; --HDFS-826 not available; fsOut=" + + wrappedStreamClass.getName() + "; err=" + e); + } catch (SecurityException e) { + logger.info("Doesn't have access to getNumCurrentReplicas on " + + "FileSystems's output stream --HDFS-826 not available; fsOut=" + + wrappedStreamClass.getName(), e); + m = null; // could happen on setAccessible() + } + } + if (m != null) { + logger.debug("Using getNumCurrentReplicas--HDFS-826"); + } + return m; + } + + /** + * Find the 'getDefaultReplication' method on the passed fs + * FileSystem that takes a Path argument. + * @return Method or null. + */ + private Method reflectGetDefaultReplication(FileSystem fileSystem) { + Method m = null; + if (fileSystem != null) { + Class fsClass = fileSystem.getClass(); + try { + m = fsClass.getMethod("getDefaultReplication", + new Class[] { Path.class }); + } catch (NoSuchMethodException e) { + logger.debug("FileSystem implementation doesn't support" + + " getDefaultReplication(Path); -- HADOOP-8014 not available; " + + "className = " + fsClass.getName() + "; err = " + e); + } catch (SecurityException e) { + logger.debug("No access to getDefaultReplication(Path) on " + + "FileSystem implementation -- HADOOP-8014 not available; " + + "className = " + fsClass.getName() + "; err = " + e); + } + } + if (m != null) { + logger.debug("Using FileSystem.getDefaultReplication(Path) from " + + "HADOOP-8014"); + } + return m; + } + + private Method reflectHflushOrSync(FSDataOutputStream os) { + Method m = null; + if (os != null) { + Class fsDataOutputStreamClass = os.getClass(); + try { + m = fsDataOutputStreamClass.getMethod("hflush"); + } catch (NoSuchMethodException ex) { + logger.debug("HFlush not found. Will use sync() instead"); + try { + m = fsDataOutputStreamClass.getMethod("sync"); + } catch (Exception ex1) { + String msg = "Neither hflush not sync were found. That seems to be " + + "a problem!"; + logger.error(msg); + throw new FlumeException(msg, ex1); + } + } + } + return m; + } + + /** + * If hflush is available in this version of HDFS, then this method calls + * hflush, else it calls sync. + * @param os - The stream to flush/sync + * @throws IOException + */ + protected void hflushOrSync(FSDataOutputStream os) throws IOException { + try { + // At this point the refHflushOrSync cannot be null, + // since register method would have thrown if it was. + this.refHflushOrSync.invoke(os); + } catch (InvocationTargetException e) { + String msg = "Error while trying to hflushOrSync!"; + logger.error(msg); + Throwable cause = e.getCause(); + if (cause != null && cause instanceof IOException) { + throw (IOException)cause; + } + throw new FlumeException(msg, e); + } catch (Exception e) { + String msg = "Error while trying to hflushOrSync!"; + logger.error(msg); + throw new FlumeException(msg, e); + } + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java new file mode 100644 index 0000000..3231742 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import org.apache.avro.AvroRuntimeException; +import org.apache.avro.Schema; +import org.apache.avro.file.CodecFactory; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.generic.GenericDatumWriter; +import org.apache.avro.io.DatumWriter; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.FlumeException; +import org.apache.flume.conf.Configurable; +import org.apache.flume.serialization.EventSerializer; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.URL; +import java.nio.ByteBuffer; +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; + +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.COMPRESSION_CODEC; +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.DEFAULT_COMPRESSION_CODEC; +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.DEFAULT_STATIC_SCHEMA_URL; +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.DEFAULT_SYNC_INTERVAL_BYTES; +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.STATIC_SCHEMA_URL; +import static org.apache.flume.serialization.AvroEventSerializerConfigurationConstants.SYNC_INTERVAL_BYTES; + +/** + *

+ * This class serializes Flume {@linkplain org.apache.flume.Event events} into Avro data files. The + * Flume event body is read as an Avro datum, and is then written to the + * {@link org.apache.flume.serialization.EventSerializer}'s output stream in Avro data file format. + *

+ *

+ * The Avro schema is determined by reading a Flume event header. The schema may be + * specified either as a literal, by setting {@link #AVRO_SCHEMA_LITERAL_HEADER} (not + * recommended, since the full schema must be transmitted in every event), + * or as a URL which the schema may be read from, by setting {@link + * #AVRO_SCHEMA_URL_HEADER}. Schemas read from URLs are cached by instances of this + * class so that the overhead of retrieval is minimized. + *

+ */ +public class AvroEventSerializer implements EventSerializer, Configurable { + + private static final Logger logger = + LoggerFactory.getLogger(AvroEventSerializer.class); + + public static final String AVRO_SCHEMA_LITERAL_HEADER = "flume.avro.schema.literal"; + public static final String AVRO_SCHEMA_URL_HEADER = "flume.avro.schema.url"; + + private final OutputStream out; + private DatumWriter writer = null; + private DataFileWriter dataFileWriter = null; + + private int syncIntervalBytes; + private String compressionCodec; + private Map schemaCache = new HashMap(); + private String staticSchemaURL; + + private AvroEventSerializer(OutputStream out) { + this.out = out; + } + + @Override + public void configure(Context context) { + syncIntervalBytes = + context.getInteger(SYNC_INTERVAL_BYTES, DEFAULT_SYNC_INTERVAL_BYTES); + compressionCodec = + context.getString(COMPRESSION_CODEC, DEFAULT_COMPRESSION_CODEC); + staticSchemaURL = context.getString(STATIC_SCHEMA_URL, DEFAULT_STATIC_SCHEMA_URL); + } + + @Override + public void afterCreate() throws IOException { + // no-op + } + + @Override + public void afterReopen() throws IOException { + // impossible to initialize DataFileWriter without writing the schema? + throw new UnsupportedOperationException("Avro API doesn't support append"); + } + + @Override + public void write(Event event) throws IOException { + if (dataFileWriter == null) { + initialize(event); + } + dataFileWriter.appendEncoded(ByteBuffer.wrap(event.getBody())); + } + + private void initialize(Event event) throws IOException { + Schema schema = null; + String schemaUrl = event.getHeaders().get(AVRO_SCHEMA_URL_HEADER); + String schemaString = event.getHeaders().get(AVRO_SCHEMA_LITERAL_HEADER); + + if (schemaUrl != null) { // if URL_HEADER is there then use it + schema = schemaCache.get(schemaUrl); + if (schema == null) { + schema = loadFromUrl(schemaUrl); + schemaCache.put(schemaUrl, schema); + } + } else if (schemaString != null) { // fallback to LITERAL_HEADER if it was there + schema = new Schema.Parser().parse(schemaString); + } else if (staticSchemaURL != null) { // fallback to static url if it was there + schema = schemaCache.get(staticSchemaURL); + if (schema == null) { + schema = loadFromUrl(staticSchemaURL); + schemaCache.put(staticSchemaURL, schema); + } + } else { // no other options so giving up + throw new FlumeException("Could not find schema for event " + event); + } + + writer = new GenericDatumWriter(schema); + dataFileWriter = new DataFileWriter(writer); + + dataFileWriter.setSyncInterval(syncIntervalBytes); + + try { + CodecFactory codecFactory = CodecFactory.fromString(compressionCodec); + dataFileWriter.setCodec(codecFactory); + } catch (AvroRuntimeException e) { + logger.warn("Unable to instantiate avro codec with name (" + + compressionCodec + "). Compression disabled. Exception follows.", e); + } + + dataFileWriter.create(schema, out); + } + + private Schema loadFromUrl(String schemaUrl) throws IOException { + Configuration conf = new Configuration(); + Schema.Parser parser = new Schema.Parser(); + if (schemaUrl.toLowerCase(Locale.ENGLISH).startsWith("hdfs://")) { + FileSystem fs = FileSystem.get(conf); + FSDataInputStream input = null; + try { + input = fs.open(new Path(schemaUrl)); + return parser.parse(input); + } finally { + if (input != null) { + input.close(); + } + } + } else { + InputStream is = null; + try { + is = new URL(schemaUrl).openStream(); + return parser.parse(is); + } finally { + if (is != null) { + is.close(); + } + } + } + } + + @Override + public void flush() throws IOException { + dataFileWriter.flush(); + } + + @Override + public void beforeClose() throws IOException { + // no-op + } + + @Override + public boolean supportsReopen() { + return false; + } + + public static class Builder implements EventSerializer.Builder { + + @Override + public EventSerializer build(Context context, OutputStream out) { + AvroEventSerializer writer = new AvroEventSerializer(out); + writer.configure(context); + return writer; + } + + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketClosedException.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketClosedException.java new file mode 100644 index 0000000..1d8a9e4 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketClosedException.java @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import org.apache.flume.FlumeException; + +public class BucketClosedException extends FlumeException { + + private static final long serialVersionUID = -4216667125119540357L; + + public BucketClosedException(String msg) { + super(msg); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java new file mode 100644 index 0000000..b096410 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java @@ -0,0 +1,717 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Throwables; +import org.apache.flume.Clock; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.SystemClock; +import org.apache.flume.auth.PrivilegedExecutor; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.flume.sink.hdfs.HDFSEventSink.WriterCallback; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.reflect.Method; +import java.security.PrivilegedExceptionAction; +import java.util.concurrent.Callable; +import java.util.concurrent.CancellationException; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Internal API intended for HDFSSink use. + * This class does file rolling and handles file formats and serialization. + * Only the public methods in this class are thread safe. + */ +class BucketWriter { + + private static final Logger LOG = LoggerFactory + .getLogger(BucketWriter.class); + + /** + * This lock ensures that only one thread can open a file at a time. + */ + private static final Integer staticLock = new Integer(1); + private Method isClosedMethod = null; + + private HDFSWriter writer; + private final long rollInterval; + private final long rollSize; + private final long rollCount; + private final long batchSize; + private final CompressionCodec codeC; + private final CompressionType compType; + private final ScheduledExecutorService timedRollerPool; + private final PrivilegedExecutor proxyUser; + + private final AtomicLong fileExtensionCounter; + + private long eventCounter; + private long processSize; + + private FileSystem fileSystem; + + private volatile String filePath; + private volatile String fileName; + private volatile String inUsePrefix; + private volatile String inUseSuffix; + private volatile String fileSuffix; + private volatile String bucketPath; + private volatile String targetPath; + private volatile long batchCounter; + private volatile boolean isOpen; + private volatile boolean isUnderReplicated; + private volatile int consecutiveUnderReplRotateCount = 0; + private volatile ScheduledFuture timedRollFuture; + private SinkCounter sinkCounter; + private final int idleTimeout; + private volatile ScheduledFuture idleFuture; + private final WriterCallback onCloseCallback; + private final String onCloseCallbackPath; + private final long callTimeout; + private final ExecutorService callTimeoutPool; + private final int maxConsecUnderReplRotations = 30; // make this config'able? + + private boolean mockFsInjected = false; + + private Clock clock = new SystemClock(); + private final long retryInterval; + private final int maxRenameTries; + + // flag that the bucket writer was closed due to idling and thus shouldn't be + // reopened. Not ideal, but avoids internals of owners + protected boolean closed = false; + AtomicInteger renameTries = new AtomicInteger(0); + + BucketWriter(long rollInterval, long rollSize, long rollCount, long batchSize, + Context context, String filePath, String fileName, String inUsePrefix, + String inUseSuffix, String fileSuffix, CompressionCodec codeC, + CompressionType compType, HDFSWriter writer, + ScheduledExecutorService timedRollerPool, PrivilegedExecutor proxyUser, + SinkCounter sinkCounter, int idleTimeout, WriterCallback onCloseCallback, + String onCloseCallbackPath, long callTimeout, + ExecutorService callTimeoutPool, long retryInterval, + int maxCloseTries) { + this.rollInterval = rollInterval; + this.rollSize = rollSize; + this.rollCount = rollCount; + this.batchSize = batchSize; + this.filePath = filePath; + this.fileName = fileName; + this.inUsePrefix = inUsePrefix; + this.inUseSuffix = inUseSuffix; + this.fileSuffix = fileSuffix; + this.codeC = codeC; + this.compType = compType; + this.writer = writer; + this.timedRollerPool = timedRollerPool; + this.proxyUser = proxyUser; + this.sinkCounter = sinkCounter; + this.idleTimeout = idleTimeout; + this.onCloseCallback = onCloseCallback; + this.onCloseCallbackPath = onCloseCallbackPath; + this.callTimeout = callTimeout; + this.callTimeoutPool = callTimeoutPool; + fileExtensionCounter = new AtomicLong(clock.currentTimeMillis()); + + this.retryInterval = retryInterval; + this.maxRenameTries = maxCloseTries; + isOpen = false; + isUnderReplicated = false; + this.writer.configure(context); + } + + @VisibleForTesting + void setFileSystem(FileSystem fs) { + this.fileSystem = fs; + mockFsInjected = true; + } + + @VisibleForTesting + void setMockStream(HDFSWriter dataWriter) { + this.writer = dataWriter; + } + + + /** + * Clear the class counters + */ + private void resetCounters() { + eventCounter = 0; + processSize = 0; + batchCounter = 0; + } + + private Method getRefIsClosed() { + try { + return fileSystem.getClass().getMethod("isFileClosed", + Path.class); + } catch (Exception e) { + LOG.warn("isFileClosed is not available in the " + + "version of HDFS being used. Flume will not " + + "attempt to close files if the close fails on " + + "the first attempt",e); + return null; + } + } + + private Boolean isFileClosed(FileSystem fs, Path tmpFilePath) throws Exception { + return (Boolean)(isClosedMethod.invoke(fs, tmpFilePath)); + } + + /** + * open() is called by append() + * @throws IOException + * @throws InterruptedException + */ + private void open() throws IOException, InterruptedException { + if ((filePath == null) || (writer == null)) { + throw new IOException("Invalid file settings"); + } + + final Configuration config = new Configuration(); + // disable FileSystem JVM shutdown hook + config.setBoolean("fs.automatic.close", false); + + // Hadoop is not thread safe when doing certain RPC operations, + // including getFileSystem(), when running under Kerberos. + // open() must be called by one thread at a time in the JVM. + // NOTE: tried synchronizing on the underlying Kerberos principal previously + // which caused deadlocks. See FLUME-1231. + synchronized (staticLock) { + checkAndThrowInterruptedException(); + + try { + long counter = fileExtensionCounter.incrementAndGet(); + + String fullFileName = fileName + "." + counter; + + if (fileSuffix != null && fileSuffix.length() > 0) { + fullFileName += fileSuffix; + } else if (codeC != null) { + fullFileName += codeC.getDefaultExtension(); + } + + bucketPath = filePath + "/" + inUsePrefix + + fullFileName + inUseSuffix; + targetPath = filePath + "/" + fullFileName; + + LOG.info("Creating " + bucketPath); + callWithTimeout(new CallRunner() { + @Override + public Void call() throws Exception { + if (codeC == null) { + // Need to get reference to FS using above config before underlying + // writer does in order to avoid shutdown hook & + // IllegalStateExceptions + if (!mockFsInjected) { + fileSystem = new Path(bucketPath).getFileSystem(config); + } + writer.open(bucketPath); + } else { + // need to get reference to FS before writer does to + // avoid shutdown hook + if (!mockFsInjected) { + fileSystem = new Path(bucketPath).getFileSystem(config); + } + writer.open(bucketPath, codeC, compType); + } + return null; + } + }); + } catch (Exception ex) { + sinkCounter.incrementConnectionFailedCount(); + if (ex instanceof IOException) { + throw (IOException) ex; + } else { + throw Throwables.propagate(ex); + } + } + } + isClosedMethod = getRefIsClosed(); + sinkCounter.incrementConnectionCreatedCount(); + resetCounters(); + + // if time-based rolling is enabled, schedule the roll + if (rollInterval > 0) { + Callable action = new Callable() { + public Void call() throws Exception { + LOG.debug("Rolling file ({}): Roll scheduled after {} sec elapsed.", + bucketPath, rollInterval); + try { + // Roll the file and remove reference from sfWriters map. + close(true); + } catch (Throwable t) { + LOG.error("Unexpected error", t); + } + return null; + } + }; + timedRollFuture = timedRollerPool.schedule(action, rollInterval, + TimeUnit.SECONDS); + } + + isOpen = true; + } + + /** + * Close the file handle and rename the temp file to the permanent filename. + * Safe to call multiple times. Logs HDFSWriter.close() exceptions. This + * method will not cause the bucket writer to be dereferenced from the HDFS + * sink that owns it. This method should be used only when size or count + * based rolling closes this file. + * @throws IOException On failure to rename if temp file exists. + * @throws InterruptedException + */ + public synchronized void close() throws IOException, InterruptedException { + close(false); + } + + private CallRunner createCloseCallRunner() { + return new CallRunner() { + private final HDFSWriter localWriter = writer; + @Override + public Void call() throws Exception { + localWriter.close(); // could block + return null; + } + }; + } + + private Callable createScheduledRenameCallable() { + + return new Callable() { + private final String path = bucketPath; + private final String finalPath = targetPath; + private FileSystem fs = fileSystem; + private int renameTries = 1; // one attempt is already done + + @Override + public Void call() throws Exception { + if (renameTries >= maxRenameTries) { + LOG.warn("Unsuccessfully attempted to rename " + path + " " + + maxRenameTries + " times. File may still be open."); + return null; + } + renameTries++; + try { + renameBucket(path, finalPath, fs); + } catch (Exception e) { + LOG.warn("Renaming file: " + path + " failed. Will " + + "retry again in " + retryInterval + " seconds.", e); + timedRollerPool.schedule(this, retryInterval, TimeUnit.SECONDS); + return null; + } + return null; + } + }; + } + + /** + * Close the file handle and rename the temp file to the permanent filename. + * Safe to call multiple times. Logs HDFSWriter.close() exceptions. + * @throws IOException On failure to rename if temp file exists. + * @throws InterruptedException + */ + public synchronized void close(boolean callCloseCallback) + throws IOException, InterruptedException { + checkAndThrowInterruptedException(); + try { + flush(); + } catch (IOException e) { + LOG.warn("pre-close flush failed", e); + } + boolean failedToClose = false; + LOG.info("Closing {}", bucketPath); + CallRunner closeCallRunner = createCloseCallRunner(); + if (isOpen) { + try { + callWithTimeout(closeCallRunner); + sinkCounter.incrementConnectionClosedCount(); + } catch (IOException e) { + LOG.warn("failed to close() HDFSWriter for file (" + bucketPath + + "). Exception follows.", e); + sinkCounter.incrementConnectionFailedCount(); + failedToClose = true; + } + isOpen = false; + } else { + LOG.info("HDFSWriter is already closed: {}", bucketPath); + } + + // NOTE: timed rolls go through this codepath as well as other roll types + if (timedRollFuture != null && !timedRollFuture.isDone()) { + timedRollFuture.cancel(false); // do not cancel myself if running! + timedRollFuture = null; + } + + if (idleFuture != null && !idleFuture.isDone()) { + idleFuture.cancel(false); // do not cancel myself if running! + idleFuture = null; + } + + if (bucketPath != null && fileSystem != null) { + // could block or throw IOException + try { + renameBucket(bucketPath, targetPath, fileSystem); + } catch (Exception e) { + LOG.warn("failed to rename() file (" + bucketPath + + "). Exception follows.", e); + sinkCounter.incrementConnectionFailedCount(); + final Callable scheduledRename = createScheduledRenameCallable(); + timedRollerPool.schedule(scheduledRename, retryInterval, TimeUnit.SECONDS); + } + } + if (callCloseCallback) { + runCloseAction(); + closed = true; + } + } + + /** + * flush the data + * @throws IOException + * @throws InterruptedException + */ + public synchronized void flush() throws IOException, InterruptedException { + checkAndThrowInterruptedException(); + if (!isBatchComplete()) { + doFlush(); + + if (idleTimeout > 0) { + // if the future exists and couldn't be cancelled, that would mean it has already run + // or been cancelled + if (idleFuture == null || idleFuture.cancel(false)) { + Callable idleAction = new Callable() { + public Void call() throws Exception { + LOG.info("Closing idle bucketWriter {} at {}", bucketPath, + System.currentTimeMillis()); + if (isOpen) { + close(true); + } + return null; + } + }; + idleFuture = timedRollerPool.schedule(idleAction, idleTimeout, + TimeUnit.SECONDS); + } + } + } + } + + private void runCloseAction() { + try { + if (onCloseCallback != null) { + onCloseCallback.run(onCloseCallbackPath); + } + } catch (Throwable t) { + LOG.error("Unexpected error", t); + } + } + + /** + * doFlush() must only be called by flush() + * @throws IOException + */ + private void doFlush() throws IOException, InterruptedException { + callWithTimeout(new CallRunner() { + @Override + public Void call() throws Exception { + writer.sync(); // could block + return null; + } + }); + batchCounter = 0; + } + + /** + * Open file handles, write data, update stats, handle file rolling and + * batching / flushing.
+ * If the write fails, the file is implicitly closed and then the IOException + * is rethrown.
+ * We rotate before append, and not after, so that the active file rolling + * mechanism will never roll an empty file. This also ensures that the file + * creation time reflects when the first event was written. + * + * @throws IOException + * @throws InterruptedException + */ + public synchronized void append(final Event event) + throws IOException, InterruptedException { + checkAndThrowInterruptedException(); + // If idleFuture is not null, cancel it before we move forward to avoid a + // close call in the middle of the append. + if (idleFuture != null) { + idleFuture.cancel(false); + // There is still a small race condition - if the idleFuture is already + // running, interrupting it can cause HDFS close operation to throw - + // so we cannot interrupt it while running. If the future could not be + // cancelled, it is already running - wait for it to finish before + // attempting to write. + if (!idleFuture.isDone()) { + try { + idleFuture.get(callTimeout, TimeUnit.MILLISECONDS); + } catch (TimeoutException ex) { + LOG.warn("Timeout while trying to cancel closing of idle file. Idle" + + " file close may have failed", ex); + } catch (Exception ex) { + LOG.warn("Error while trying to cancel closing of idle file. ", ex); + } + } + idleFuture = null; + } + + // If the bucket writer was closed due to roll timeout or idle timeout, + // force a new bucket writer to be created. Roll count and roll size will + // just reuse this one + if (!isOpen) { + if (closed) { + throw new BucketClosedException("This bucket writer was closed and " + + "this handle is thus no longer valid"); + } + open(); + } + + // check if it's time to rotate the file + if (shouldRotate()) { + boolean doRotate = true; + + if (isUnderReplicated) { + if (maxConsecUnderReplRotations > 0 && + consecutiveUnderReplRotateCount >= maxConsecUnderReplRotations) { + doRotate = false; + if (consecutiveUnderReplRotateCount == maxConsecUnderReplRotations) { + LOG.error("Hit max consecutive under-replication rotations ({}); " + + "will not continue rolling files under this path due to " + + "under-replication", maxConsecUnderReplRotations); + } + } else { + LOG.warn("Block Under-replication detected. Rotating file."); + } + consecutiveUnderReplRotateCount++; + } else { + consecutiveUnderReplRotateCount = 0; + } + + if (doRotate) { + close(); + open(); + } + } + + // write the event + try { + sinkCounter.incrementEventDrainAttemptCount(); + callWithTimeout(new CallRunner() { + @Override + public Void call() throws Exception { + writer.append(event); // could block + return null; + } + }); + } catch (IOException e) { + LOG.warn("Caught IOException writing to HDFSWriter ({}). Closing file (" + + bucketPath + ") and rethrowing exception.", + e.getMessage()); + try { + close(true); + } catch (IOException e2) { + LOG.warn("Caught IOException while closing file (" + + bucketPath + "). Exception follows.", e2); + } + throw e; + } + + // update statistics + processSize += event.getBody().length; + eventCounter++; + batchCounter++; + + if (batchCounter == batchSize) { + flush(); + } + } + + /** + * check if time to rotate the file + */ + private boolean shouldRotate() { + boolean doRotate = false; + + if (writer.isUnderReplicated()) { + this.isUnderReplicated = true; + doRotate = true; + } else { + this.isUnderReplicated = false; + } + + if ((rollCount > 0) && (rollCount <= eventCounter)) { + LOG.debug("rolling: rollCount: {}, events: {}", rollCount, eventCounter); + doRotate = true; + } + + if ((rollSize > 0) && (rollSize <= processSize)) { + LOG.debug("rolling: rollSize: {}, bytes: {}", rollSize, processSize); + doRotate = true; + } + + return doRotate; + } + + /** + * Rename bucketPath file from .tmp to permanent location. + */ + // When this bucket writer is rolled based on rollCount or + // rollSize, the same instance is reused for the new file. But if + // the previous file was not closed/renamed, + // the bucket writer fields no longer point to it and hence need + // to be passed in from the thread attempting to close it. Even + // when the bucket writer is closed due to close timeout, + // this method can get called from the scheduled thread so the + // file gets closed later - so an implicit reference to this + // bucket writer would still be alive in the Callable instance. + private void renameBucket(String bucketPath, String targetPath, final FileSystem fs) + throws IOException, InterruptedException { + if (bucketPath.equals(targetPath)) { + return; + } + + final Path srcPath = new Path(bucketPath); + final Path dstPath = new Path(targetPath); + + callWithTimeout(new CallRunner() { + @Override + public Void call() throws Exception { + if (fs.exists(srcPath)) { // could block + LOG.info("Renaming " + srcPath + " to " + dstPath); + renameTries.incrementAndGet(); + fs.rename(srcPath, dstPath); // could block + } + return null; + } + }); + } + + @Override + public String toString() { + return "[ " + this.getClass().getSimpleName() + " targetPath = " + targetPath + + ", bucketPath = " + bucketPath + " ]"; + } + + private boolean isBatchComplete() { + return (batchCounter == 0); + } + + void setClock(Clock clock) { + this.clock = clock; + } + + /** + * This method if the current thread has been interrupted and throws an + * exception. + * @throws InterruptedException + */ + private static void checkAndThrowInterruptedException() + throws InterruptedException { + if (Thread.currentThread().interrupted()) { + throw new InterruptedException("Timed out before HDFS call was made. " + + "Your hdfs.callTimeout might be set too low or HDFS calls are " + + "taking too long."); + } + } + + /** + * Execute the callable on a separate thread and wait for the completion + * for the specified amount of time in milliseconds. In case of timeout + * cancel the callable and throw an IOException + */ + private T callWithTimeout(final CallRunner callRunner) + throws IOException, InterruptedException { + Future future = callTimeoutPool.submit(new Callable() { + @Override + public T call() throws Exception { + return proxyUser.execute(new PrivilegedExceptionAction() { + @Override + public T run() throws Exception { + return callRunner.call(); + } + }); + } + }); + try { + if (callTimeout > 0) { + return future.get(callTimeout, TimeUnit.MILLISECONDS); + } else { + return future.get(); + } + } catch (TimeoutException eT) { + future.cancel(true); + sinkCounter.incrementConnectionFailedCount(); + throw new IOException("Callable timed out after " + + callTimeout + " ms" + " on file: " + bucketPath, eT); + } catch (ExecutionException e1) { + sinkCounter.incrementConnectionFailedCount(); + Throwable cause = e1.getCause(); + if (cause instanceof IOException) { + throw (IOException) cause; + } else if (cause instanceof InterruptedException) { + throw (InterruptedException) cause; + } else if (cause instanceof RuntimeException) { + throw (RuntimeException) cause; + } else if (cause instanceof Error) { + throw (Error)cause; + } else { + throw new RuntimeException(e1); + } + } catch (CancellationException ce) { + throw new InterruptedException( + "Blocked callable interrupted by rotation event"); + } catch (InterruptedException ex) { + LOG.warn("Unexpected Exception " + ex.getMessage(), ex); + throw ex; + } + } + + /** + * Simple interface whose call method is called by + * {#callWithTimeout} in a new thread inside a + * {@linkplain java.security.PrivilegedExceptionAction#run()} call. + * @param + */ + private interface CallRunner { + T call() throws Exception; + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java new file mode 100644 index 0000000..80b7cb4 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java @@ -0,0 +1,162 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.serialization.EventSerializer; +import org.apache.flume.serialization.EventSerializerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocalFileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionOutputStream; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DefaultCodec; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HDFSCompressedDataStream extends AbstractHDFSWriter { + + private static final Logger logger = + LoggerFactory.getLogger(HDFSCompressedDataStream.class); + + private FSDataOutputStream fsOut; + private CompressionOutputStream cmpOut; + private boolean isFinished = false; + + private String serializerType; + private Context serializerContext; + private EventSerializer serializer; + private boolean useRawLocalFileSystem; + private Compressor compressor; + + @Override + public void configure(Context context) { + super.configure(context); + + serializerType = context.getString("serializer", "TEXT"); + useRawLocalFileSystem = context.getBoolean("hdfs.useRawLocalFileSystem", + false); + serializerContext = new Context( + context.getSubProperties(EventSerializer.CTX_PREFIX)); + logger.info("Serializer = " + serializerType + ", UseRawLocalFileSystem = " + + useRawLocalFileSystem); + } + + @Override + public void open(String filePath) throws IOException { + DefaultCodec defCodec = new DefaultCodec(); + CompressionType cType = CompressionType.BLOCK; + open(filePath, defCodec, cType); + } + + @Override + public void open(String filePath, CompressionCodec codec, + CompressionType cType) throws IOException { + Configuration conf = new Configuration(); + Path dstPath = new Path(filePath); + FileSystem hdfs = dstPath.getFileSystem(conf); + if (useRawLocalFileSystem) { + if (hdfs instanceof LocalFileSystem) { + hdfs = ((LocalFileSystem)hdfs).getRaw(); + } else { + logger.warn("useRawLocalFileSystem is set to true but file system " + + "is not of type LocalFileSystem: " + hdfs.getClass().getName()); + } + } + boolean appending = false; + if (conf.getBoolean("hdfs.append.support", false) == true && hdfs.isFile(dstPath)) { + fsOut = hdfs.append(dstPath); + appending = true; + } else { + fsOut = hdfs.create(dstPath); + } + if (compressor == null) { + compressor = CodecPool.getCompressor(codec, conf); + } + cmpOut = codec.createOutputStream(fsOut, compressor); + serializer = EventSerializerFactory.getInstance(serializerType, + serializerContext, cmpOut); + if (appending && !serializer.supportsReopen()) { + cmpOut.close(); + serializer = null; + throw new IOException("serializer (" + serializerType + + ") does not support append"); + } + + registerCurrentStream(fsOut, hdfs, dstPath); + + if (appending) { + serializer.afterReopen(); + } else { + serializer.afterCreate(); + } + isFinished = false; + } + + @Override + public void append(Event e) throws IOException { + if (isFinished) { + cmpOut.resetState(); + isFinished = false; + } + serializer.write(e); + } + + @Override + public void sync() throws IOException { + // We must use finish() and resetState() here -- flush() is apparently not + // supported by the compressed output streams (it's a no-op). + // Also, since resetState() writes headers, avoid calling it without an + // additional write/append operation. + // Note: There are bugs in Hadoop & JDK w/ pure-java gzip; see HADOOP-8522. + serializer.flush(); + if (!isFinished) { + cmpOut.finish(); + isFinished = true; + } + fsOut.flush(); + hflushOrSync(this.fsOut); + } + + @Override + public void close() throws IOException { + serializer.flush(); + serializer.beforeClose(); + if (!isFinished) { + cmpOut.finish(); + isFinished = true; + } + fsOut.flush(); + hflushOrSync(fsOut); + cmpOut.close(); + if (compressor != null) { + CodecPool.returnCompressor(compressor); + compressor = null; + } + unregisterCurrentStream(); + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java new file mode 100644 index 0000000..c4ad919 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java @@ -0,0 +1,140 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.serialization.EventSerializer; +import org.apache.flume.serialization.EventSerializerFactory; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocalFileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HDFSDataStream extends AbstractHDFSWriter { + + private static final Logger logger = LoggerFactory.getLogger(HDFSDataStream.class); + + private FSDataOutputStream outStream; + private String serializerType; + private Context serializerContext; + private EventSerializer serializer; + private boolean useRawLocalFileSystem; + + @Override + public void configure(Context context) { + super.configure(context); + + serializerType = context.getString("serializer", "TEXT"); + useRawLocalFileSystem = context.getBoolean("hdfs.useRawLocalFileSystem", + false); + serializerContext = + new Context(context.getSubProperties(EventSerializer.CTX_PREFIX)); + logger.info("Serializer = " + serializerType + ", UseRawLocalFileSystem = " + + useRawLocalFileSystem); + } + + @VisibleForTesting + protected FileSystem getDfs(Configuration conf, Path dstPath) throws IOException { + return dstPath.getFileSystem(conf); + } + + protected void doOpen(Configuration conf, Path dstPath, FileSystem hdfs) throws IOException { + if (useRawLocalFileSystem) { + if (hdfs instanceof LocalFileSystem) { + hdfs = ((LocalFileSystem)hdfs).getRaw(); + } else { + logger.warn("useRawLocalFileSystem is set to true but file system " + + "is not of type LocalFileSystem: " + hdfs.getClass().getName()); + } + } + + boolean appending = false; + if (conf.getBoolean("hdfs.append.support", false) == true && hdfs.isFile(dstPath)) { + outStream = hdfs.append(dstPath); + appending = true; + } else { + outStream = hdfs.create(dstPath); + } + + serializer = EventSerializerFactory.getInstance( + serializerType, serializerContext, outStream); + if (appending && !serializer.supportsReopen()) { + outStream.close(); + serializer = null; + throw new IOException("serializer (" + serializerType + + ") does not support append"); + } + + // must call superclass to check for replication issues + registerCurrentStream(outStream, hdfs, dstPath); + + if (appending) { + serializer.afterReopen(); + } else { + serializer.afterCreate(); + } + } + + @Override + public void open(String filePath) throws IOException { + Configuration conf = new Configuration(); + Path dstPath = new Path(filePath); + FileSystem hdfs = getDfs(conf, dstPath); + doOpen(conf, dstPath, hdfs); + } + + @Override + public void open(String filePath, CompressionCodec codec, + CompressionType cType) throws IOException { + open(filePath); + } + + @Override + public void append(Event e) throws IOException { + serializer.write(e); + } + + @Override + public void sync() throws IOException { + serializer.flush(); + outStream.flush(); + hflushOrSync(outStream); + } + + @Override + public void close() throws IOException { + serializer.flush(); + serializer.beforeClose(); + outStream.flush(); + hflushOrSync(outStream); + outStream.close(); + + unregisterCurrentStream(); + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java new file mode 100644 index 0000000..741f01e --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java @@ -0,0 +1,559 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Calendar; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.TimeZone; +import java.util.Map.Entry; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.flume.Channel; +import org.apache.flume.Clock; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.SystemClock; +import org.apache.flume.Transaction; +import org.apache.flume.auth.FlumeAuthenticationUtil; +import org.apache.flume.auth.FlumeAuthenticator; +import org.apache.flume.auth.PrivilegedExecutor; +import org.apache.flume.conf.Configurable; +import org.apache.flume.formatter.output.BucketPath; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.flume.sink.AbstractSink; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.util.concurrent.ThreadFactoryBuilder; + +public class HDFSEventSink extends AbstractSink implements Configurable { + public interface WriterCallback { + public void run(String filePath); + } + + private static final Logger LOG = LoggerFactory.getLogger(HDFSEventSink.class); + + private static String DIRECTORY_DELIMITER = System.getProperty("file.separator"); + + private static final long defaultRollInterval = 30; + private static final long defaultRollSize = 1024; + private static final long defaultRollCount = 10; + private static final String defaultFileName = "FlumeData"; + private static final String defaultSuffix = ""; + private static final String defaultInUsePrefix = ""; + private static final String defaultInUseSuffix = ".tmp"; + private static final long defaultBatchSize = 100; + private static final String defaultFileType = HDFSWriterFactory.SequenceFileType; + private static final int defaultMaxOpenFiles = 5000; + // Time between close retries, in seconds + private static final long defaultRetryInterval = 180; + // Retry forever. + private static final int defaultTryCount = Integer.MAX_VALUE; + + /** + * Default length of time we wait for blocking BucketWriter calls + * before timing out the operation. Intended to prevent server hangs. + */ + private static final long defaultCallTimeout = 10000; + /** + * Default number of threads available for tasks + * such as append/open/close/flush with hdfs. + * These tasks are done in a separate thread in + * the case that they take too long. In which + * case we create a new file and move on. + */ + private static final int defaultThreadPoolSize = 10; + private static final int defaultRollTimerPoolSize = 1; + + private final HDFSWriterFactory writerFactory; + private WriterLinkedHashMap sfWriters; + + private long rollInterval; + private long rollSize; + private long rollCount; + private long batchSize; + private int threadsPoolSize; + private int rollTimerPoolSize; + private CompressionCodec codeC; + private CompressionType compType; + private String fileType; + private String filePath; + private String fileName; + private String suffix; + private String inUsePrefix; + private String inUseSuffix; + private TimeZone timeZone; + private int maxOpenFiles; + private ExecutorService callTimeoutPool; + private ScheduledExecutorService timedRollerPool; + + private boolean needRounding = false; + private int roundUnit = Calendar.SECOND; + private int roundValue = 1; + private boolean useLocalTime = false; + + private long callTimeout; + private Context context; + private SinkCounter sinkCounter; + + private volatile int idleTimeout; + private Clock clock; + private FileSystem mockFs; + private HDFSWriter mockWriter; + private final Object sfWritersLock = new Object(); + private long retryInterval; + private int tryCount; + private PrivilegedExecutor privExecutor; + + + /* + * Extended Java LinkedHashMap for open file handle LRU queue. + * We want to clear the oldest file handle if there are too many open ones. + */ + private static class WriterLinkedHashMap + extends LinkedHashMap { + + private final int maxOpenFiles; + + public WriterLinkedHashMap(int maxOpenFiles) { + super(16, 0.75f, true); // stock initial capacity/load, access ordering + this.maxOpenFiles = maxOpenFiles; + } + + @Override + protected boolean removeEldestEntry(Entry eldest) { + if (size() > maxOpenFiles) { + // If we have more that max open files, then close the last one and + // return true + try { + eldest.getValue().close(); + } catch (IOException e) { + LOG.warn(eldest.getKey().toString(), e); + } catch (InterruptedException e) { + LOG.warn(eldest.getKey().toString(), e); + Thread.currentThread().interrupt(); + } + return true; + } else { + return false; + } + } + } + + public HDFSEventSink() { + this(new HDFSWriterFactory()); + } + + public HDFSEventSink(HDFSWriterFactory writerFactory) { + this.writerFactory = writerFactory; + } + + @VisibleForTesting + Map getSfWriters() { + return sfWriters; + } + + // read configuration and setup thresholds + @Override + public void configure(Context context) { + this.context = context; + + filePath = Preconditions.checkNotNull( + context.getString("hdfs.path"), "hdfs.path is required"); + fileName = context.getString("hdfs.filePrefix", defaultFileName); + this.suffix = context.getString("hdfs.fileSuffix", defaultSuffix); + inUsePrefix = context.getString("hdfs.inUsePrefix", defaultInUsePrefix); + inUseSuffix = context.getString("hdfs.inUseSuffix", defaultInUseSuffix); + String tzName = context.getString("hdfs.timeZone"); + timeZone = tzName == null ? null : TimeZone.getTimeZone(tzName); + rollInterval = context.getLong("hdfs.rollInterval", defaultRollInterval); + rollSize = context.getLong("hdfs.rollSize", defaultRollSize); + rollCount = context.getLong("hdfs.rollCount", defaultRollCount); + batchSize = context.getLong("hdfs.batchSize", defaultBatchSize); + idleTimeout = context.getInteger("hdfs.idleTimeout", 0); + String codecName = context.getString("hdfs.codeC"); + fileType = context.getString("hdfs.fileType", defaultFileType); + maxOpenFiles = context.getInteger("hdfs.maxOpenFiles", defaultMaxOpenFiles); + callTimeout = context.getLong("hdfs.callTimeout", defaultCallTimeout); + threadsPoolSize = context.getInteger("hdfs.threadsPoolSize", + defaultThreadPoolSize); + rollTimerPoolSize = context.getInteger("hdfs.rollTimerPoolSize", + defaultRollTimerPoolSize); + String kerbConfPrincipal = context.getString("hdfs.kerberosPrincipal"); + String kerbKeytab = context.getString("hdfs.kerberosKeytab"); + String proxyUser = context.getString("hdfs.proxyUser"); + tryCount = context.getInteger("hdfs.closeTries", defaultTryCount); + if (tryCount <= 0) { + LOG.warn("Retry count value : " + tryCount + " is not " + + "valid. The sink will try to close the file until the file " + + "is eventually closed."); + tryCount = defaultTryCount; + } + retryInterval = context.getLong("hdfs.retryInterval", defaultRetryInterval); + if (retryInterval <= 0) { + LOG.warn("Retry Interval value: " + retryInterval + " is not " + + "valid. If the first close of a file fails, " + + "it may remain open and will not be renamed."); + tryCount = 1; + } + + Preconditions.checkArgument(batchSize > 0, "batchSize must be greater than 0"); + if (codecName == null) { + codeC = null; + compType = CompressionType.NONE; + } else { + codeC = getCodec(codecName); + // TODO : set proper compression type + compType = CompressionType.BLOCK; + } + + // Do not allow user to set fileType DataStream with codeC together + // To prevent output file with compress extension (like .snappy) + if (fileType.equalsIgnoreCase(HDFSWriterFactory.DataStreamType) && codecName != null) { + throw new IllegalArgumentException("fileType: " + fileType + + " which does NOT support compressed output. Please don't set codeC" + + " or change the fileType if compressed output is desired."); + } + + if (fileType.equalsIgnoreCase(HDFSWriterFactory.CompStreamType)) { + Preconditions.checkNotNull(codeC, "It's essential to set compress codec" + + " when fileType is: " + fileType); + } + + // get the appropriate executor + this.privExecutor = FlumeAuthenticationUtil.getAuthenticator( + kerbConfPrincipal, kerbKeytab).proxyAs(proxyUser); + + needRounding = context.getBoolean("hdfs.round", false); + + if (needRounding) { + String unit = context.getString("hdfs.roundUnit", "second"); + if (unit.equalsIgnoreCase("hour")) { + this.roundUnit = Calendar.HOUR_OF_DAY; + } else if (unit.equalsIgnoreCase("minute")) { + this.roundUnit = Calendar.MINUTE; + } else if (unit.equalsIgnoreCase("second")) { + this.roundUnit = Calendar.SECOND; + } else { + LOG.warn("Rounding unit is not valid, please set one of" + + "minute, hour, or second. Rounding will be disabled"); + needRounding = false; + } + this.roundValue = context.getInteger("hdfs.roundValue", 1); + if (roundUnit == Calendar.SECOND || roundUnit == Calendar.MINUTE) { + Preconditions.checkArgument(roundValue > 0 && roundValue <= 60, + "Round value" + + "must be > 0 and <= 60"); + } else if (roundUnit == Calendar.HOUR_OF_DAY) { + Preconditions.checkArgument(roundValue > 0 && roundValue <= 24, + "Round value" + + "must be > 0 and <= 24"); + } + } + + this.useLocalTime = context.getBoolean("hdfs.useLocalTimeStamp", false); + if (useLocalTime) { + clock = new SystemClock(); + } + + if (sinkCounter == null) { + sinkCounter = new SinkCounter(getName()); + } + } + + private static boolean codecMatches(Class cls, String codecName) { + String simpleName = cls.getSimpleName(); + if (cls.getName().equals(codecName) || simpleName.equalsIgnoreCase(codecName)) { + return true; + } + if (simpleName.endsWith("Codec")) { + String prefix = simpleName.substring(0, simpleName.length() - "Codec".length()); + if (prefix.equalsIgnoreCase(codecName)) { + return true; + } + } + return false; + } + + @VisibleForTesting + static CompressionCodec getCodec(String codecName) { + Configuration conf = new Configuration(); + List> codecs = CompressionCodecFactory.getCodecClasses(conf); + // Wish we could base this on DefaultCodec but appears not all codec's + // extend DefaultCodec(Lzo) + CompressionCodec codec = null; + ArrayList codecStrs = new ArrayList(); + codecStrs.add("None"); + for (Class cls : codecs) { + codecStrs.add(cls.getSimpleName()); + if (codecMatches(cls, codecName)) { + try { + codec = cls.newInstance(); + } catch (InstantiationException e) { + LOG.error("Unable to instantiate " + cls + " class"); + } catch (IllegalAccessException e) { + LOG.error("Unable to access " + cls + " class"); + } + } + } + + if (codec == null) { + if (!codecName.equalsIgnoreCase("None")) { + throw new IllegalArgumentException("Unsupported compression codec " + + codecName + ". Please choose from: " + codecStrs); + } + } else if (codec instanceof org.apache.hadoop.conf.Configurable) { + // Must check instanceof codec as BZip2Codec doesn't inherit Configurable + // Must set the configuration for Configurable objects that may or do use + // native libs + ((org.apache.hadoop.conf.Configurable) codec).setConf(conf); + } + return codec; + } + + + /** + * Pull events out of channel and send it to HDFS. Take at most batchSize + * events per Transaction. Find the corresponding bucket for the event. + * Ensure the file is open. Serialize the data and write it to the file on + * HDFS.
+ * This method is not thread safe. + */ + public Status process() throws EventDeliveryException { + Channel channel = getChannel(); + Transaction transaction = channel.getTransaction(); + List writers = Lists.newArrayList(); + transaction.begin(); + try { + int txnEventCount = 0; + for (txnEventCount = 0; txnEventCount < batchSize; txnEventCount++) { + Event event = channel.take(); + if (event == null) { + break; + } + + // reconstruct the path name by substituting place holders + String realPath = BucketPath.escapeString(filePath, event.getHeaders(), + timeZone, needRounding, roundUnit, roundValue, useLocalTime); + String realName = BucketPath.escapeString(fileName, event.getHeaders(), + timeZone, needRounding, roundUnit, roundValue, useLocalTime); + + String lookupPath = realPath + DIRECTORY_DELIMITER + realName; + BucketWriter bucketWriter; + HDFSWriter hdfsWriter = null; + // Callback to remove the reference to the bucket writer from the + // sfWriters map so that all buffers used by the HDFS file + // handles are garbage collected. + WriterCallback closeCallback = new WriterCallback() { + @Override + public void run(String bucketPath) { + LOG.info("Writer callback called."); + synchronized (sfWritersLock) { + sfWriters.remove(bucketPath); + } + } + }; + synchronized (sfWritersLock) { + bucketWriter = sfWriters.get(lookupPath); + // we haven't seen this file yet, so open it and cache the handle + if (bucketWriter == null) { + hdfsWriter = writerFactory.getWriter(fileType); + bucketWriter = initializeBucketWriter(realPath, realName, + lookupPath, hdfsWriter, closeCallback); + sfWriters.put(lookupPath, bucketWriter); + } + } + + // track the buckets getting written in this transaction + if (!writers.contains(bucketWriter)) { + writers.add(bucketWriter); + } + + // Write the data to HDFS + try { + bucketWriter.append(event); + } catch (BucketClosedException ex) { + LOG.info("Bucket was closed while trying to append, " + + "reinitializing bucket and writing event."); + hdfsWriter = writerFactory.getWriter(fileType); + bucketWriter = initializeBucketWriter(realPath, realName, + lookupPath, hdfsWriter, closeCallback); + synchronized (sfWritersLock) { + sfWriters.put(lookupPath, bucketWriter); + } + bucketWriter.append(event); + } + } + + if (txnEventCount == 0) { + sinkCounter.incrementBatchEmptyCount(); + } else if (txnEventCount == batchSize) { + sinkCounter.incrementBatchCompleteCount(); + } else { + sinkCounter.incrementBatchUnderflowCount(); + } + + // flush all pending buckets before committing the transaction + for (BucketWriter bucketWriter : writers) { + bucketWriter.flush(); + } + + transaction.commit(); + + if (txnEventCount < 1) { + return Status.BACKOFF; + } else { + sinkCounter.addToEventDrainSuccessCount(txnEventCount); + return Status.READY; + } + } catch (IOException eIO) { + transaction.rollback(); + LOG.warn("HDFS IO error", eIO); + return Status.BACKOFF; + } catch (Throwable th) { + transaction.rollback(); + LOG.error("process failed", th); + if (th instanceof Error) { + throw (Error) th; + } else { + throw new EventDeliveryException(th); + } + } finally { + transaction.close(); + } + } + + private BucketWriter initializeBucketWriter(String realPath, + String realName, String lookupPath, HDFSWriter hdfsWriter, + WriterCallback closeCallback) { + BucketWriter bucketWriter = new BucketWriter(rollInterval, + rollSize, rollCount, + batchSize, context, realPath, realName, inUsePrefix, inUseSuffix, + suffix, codeC, compType, hdfsWriter, timedRollerPool, + privExecutor, sinkCounter, idleTimeout, closeCallback, + lookupPath, callTimeout, callTimeoutPool, retryInterval, + tryCount); + if (mockFs != null) { + bucketWriter.setFileSystem(mockFs); + bucketWriter.setMockStream(mockWriter); + } + return bucketWriter; + } + + @Override + public void stop() { + // do not constrain close() calls with a timeout + synchronized (sfWritersLock) { + for (Entry entry : sfWriters.entrySet()) { + LOG.info("Closing {}", entry.getKey()); + + try { + entry.getValue().close(); + } catch (Exception ex) { + LOG.warn("Exception while closing " + entry.getKey() + ". " + + "Exception follows.", ex); + if (ex instanceof InterruptedException) { + Thread.currentThread().interrupt(); + } + } + } + } + + // shut down all our thread pools + ExecutorService[] toShutdown = { callTimeoutPool, timedRollerPool }; + for (ExecutorService execService : toShutdown) { + execService.shutdown(); + try { + while (execService.isTerminated() == false) { + execService.awaitTermination( + Math.max(defaultCallTimeout, callTimeout), TimeUnit.MILLISECONDS); + } + } catch (InterruptedException ex) { + LOG.warn("shutdown interrupted on " + execService, ex); + } + } + + callTimeoutPool = null; + timedRollerPool = null; + + synchronized (sfWritersLock) { + sfWriters.clear(); + sfWriters = null; + } + sinkCounter.stop(); + super.stop(); + } + + @Override + public void start() { + String timeoutName = "hdfs-" + getName() + "-call-runner-%d"; + callTimeoutPool = Executors.newFixedThreadPool(threadsPoolSize, + new ThreadFactoryBuilder().setNameFormat(timeoutName).build()); + + String rollerName = "hdfs-" + getName() + "-roll-timer-%d"; + timedRollerPool = Executors.newScheduledThreadPool(rollTimerPoolSize, + new ThreadFactoryBuilder().setNameFormat(rollerName).build()); + + this.sfWriters = new WriterLinkedHashMap(maxOpenFiles); + sinkCounter.start(); + super.start(); + } + + @Override + public String toString() { + return "{ Sink type:" + getClass().getSimpleName() + ", name:" + getName() + + " }"; + } + + @VisibleForTesting + void setBucketClock(Clock clock) { + BucketPath.setClock(clock); + } + + @VisibleForTesting + void setMockFs(FileSystem mockFs) { + this.mockFs = mockFs; + } + + @VisibleForTesting + void setMockWriter(HDFSWriter writer) { + this.mockWriter = writer; + } + + @VisibleForTesting + int getTryCount() { + return tryCount; + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java new file mode 100644 index 0000000..c5430ba --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java @@ -0,0 +1,122 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocalFileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HDFSSequenceFile extends AbstractHDFSWriter { + + private static final Logger logger = + LoggerFactory.getLogger(HDFSSequenceFile.class); + private SequenceFile.Writer writer; + private String writeFormat; + private Context serializerContext; + private SequenceFileSerializer serializer; + private boolean useRawLocalFileSystem; + private FSDataOutputStream outStream = null; + + public HDFSSequenceFile() { + writer = null; + } + + @Override + public void configure(Context context) { + super.configure(context); + + // use binary writable serialize by default + writeFormat = context.getString("hdfs.writeFormat", + SequenceFileSerializerType.Writable.name()); + useRawLocalFileSystem = context.getBoolean("hdfs.useRawLocalFileSystem", + false); + serializerContext = new Context( + context.getSubProperties(SequenceFileSerializerFactory.CTX_PREFIX)); + serializer = SequenceFileSerializerFactory + .getSerializer(writeFormat, serializerContext); + logger.info("writeFormat = " + writeFormat + ", UseRawLocalFileSystem = " + + useRawLocalFileSystem); + } + + @Override + public void open(String filePath) throws IOException { + open(filePath, null, CompressionType.NONE); + } + + @Override + public void open(String filePath, CompressionCodec codeC, + CompressionType compType) throws IOException { + Configuration conf = new Configuration(); + Path dstPath = new Path(filePath); + FileSystem hdfs = dstPath.getFileSystem(conf); + open(dstPath, codeC, compType, conf, hdfs); + } + + protected void open(Path dstPath, CompressionCodec codeC, + CompressionType compType, Configuration conf, FileSystem hdfs) + throws IOException { + if (useRawLocalFileSystem) { + if (hdfs instanceof LocalFileSystem) { + hdfs = ((LocalFileSystem)hdfs).getRaw(); + } else { + logger.warn("useRawLocalFileSystem is set to true but file system " + + "is not of type LocalFileSystem: " + hdfs.getClass().getName()); + } + } + if (conf.getBoolean("hdfs.append.support", false) == true && hdfs.isFile(dstPath)) { + outStream = hdfs.append(dstPath); + } else { + outStream = hdfs.create(dstPath); + } + writer = SequenceFile.createWriter(conf, outStream, + serializer.getKeyClass(), serializer.getValueClass(), compType, codeC); + + registerCurrentStream(outStream, hdfs, dstPath); + } + + @Override + public void append(Event e) throws IOException { + for (SequenceFileSerializer.Record record : serializer.serialize(e)) { + writer.append(record.getKey(), record.getValue()); + } + } + + @Override + public void sync() throws IOException { + writer.sync(); + hflushOrSync(outStream); + } + + @Override + public void close() throws IOException { + writer.close(); + outStream.close(); + unregisterCurrentStream(); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextSerializer.java new file mode 100644 index 0000000..32fd206 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSTextSerializer.java @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.util.Collections; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.LongWritable; + +public class HDFSTextSerializer implements SequenceFileSerializer { + + private Text makeText(Event e) { + Text textObject = new Text(); + textObject.set(e.getBody(), 0, e.getBody().length); + return textObject; + } + + @Override + public Class getKeyClass() { + return LongWritable.class; + } + + @Override + public Class getValueClass() { + return Text.class; + } + + @Override + public Iterable serialize(Event e) { + Object key = getKey(e); + Object value = getValue(e); + return Collections.singletonList(new Record(key, value)); + } + + private Object getKey(Event e) { + // Write the data to HDFS + String timestamp = e.getHeaders().get("timestamp"); + long eventStamp; + + if (timestamp == null) { + eventStamp = System.currentTimeMillis(); + } else { + eventStamp = Long.valueOf(timestamp); + } + return new LongWritable(eventStamp); + } + + private Object getValue(Event e) { + return makeText(e); + } + + public static class Builder implements SequenceFileSerializer.Builder { + + @Override + public SequenceFileSerializer build(Context context) { + return new HDFSTextSerializer(); + } + + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableSerializer.java new file mode 100644 index 0000000..b25a6ea --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWritableSerializer.java @@ -0,0 +1,77 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.hdfs; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.LongWritable; + +import java.util.Collections; + +public class HDFSWritableSerializer implements SequenceFileSerializer { + + private BytesWritable makeByteWritable(Event e) { + BytesWritable bytesObject = new BytesWritable(); + bytesObject.set(e.getBody(), 0, e.getBody().length); + return bytesObject; + } + + @Override + public Class getKeyClass() { + return LongWritable.class; + } + + @Override + public Class getValueClass() { + return BytesWritable.class; + } + + @Override + public Iterable serialize(Event e) { + Object key = getKey(e); + Object value = getValue(e); + return Collections.singletonList(new Record(key, value)); + } + + private Object getKey(Event e) { + String timestamp = e.getHeaders().get("timestamp"); + long eventStamp; + + if (timestamp == null) { + eventStamp = System.currentTimeMillis(); + } else { + eventStamp = Long.valueOf(timestamp); + } + return new LongWritable(eventStamp); + } + + private Object getValue(Event e) { + return makeByteWritable(e); + } + + public static class Builder implements SequenceFileSerializer.Builder { + + @Override + public SequenceFileSerializer build(Context context) { + return new HDFSWritableSerializer(); + } + + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java new file mode 100644 index 0000000..44a984a --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriter.java @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; + +import org.apache.flume.Event; +import org.apache.flume.annotations.InterfaceAudience; +import org.apache.flume.annotations.InterfaceStability; +import org.apache.flume.conf.Configurable; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; + +@InterfaceAudience.Private +@InterfaceStability.Evolving +public interface HDFSWriter extends Configurable { + + public void open(String filePath) throws IOException; + + public void open(String filePath, CompressionCodec codec, + CompressionType cType) throws IOException; + + public void append(Event e) throws IOException; + + public void sync() throws IOException; + + public void close() throws IOException; + + public boolean isUnderReplicated(); + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriterFactory.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriterFactory.java new file mode 100644 index 0000000..a90d536 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSWriterFactory.java @@ -0,0 +1,43 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; + +public class HDFSWriterFactory { + static final String SequenceFileType = "SequenceFile"; + static final String DataStreamType = "DataStream"; + static final String CompStreamType = "CompressedStream"; + + public HDFSWriterFactory() { + + } + + public HDFSWriter getWriter(String fileType) throws IOException { + if (fileType.equalsIgnoreCase(SequenceFileType)) { + return new HDFSSequenceFile(); + } else if (fileType.equalsIgnoreCase(DataStreamType)) { + return new HDFSDataStream(); + } else if (fileType.equalsIgnoreCase(CompStreamType)) { + return new HDFSCompressedDataStream(); + } else { + throw new IOException("File type " + fileType + " not supported"); + } + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/KerberosUser.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/KerberosUser.java new file mode 100644 index 0000000..43297e2 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/KerberosUser.java @@ -0,0 +1,72 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with this + * work for additional information regarding copyright ownership. The ASF + * licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + * License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.flume.sink.hdfs; + +/** + * Simple Pair class used to define a unique (principal, keyTab) combination. + */ +public class KerberosUser { + + private final String principal; + private final String keyTab; + + public KerberosUser(String principal, String keyTab) { + this.principal = principal; + this.keyTab = keyTab; + } + + public String getPrincipal() { + return principal; + } + + public String getKeyTab() { + return keyTab; + } + + @Override + public boolean equals(Object obj) { + if (obj == null) { + return false; + } + if (getClass() != obj.getClass()) { + return false; + } + final KerberosUser other = (KerberosUser) obj; + if ((this.principal == null) ? + (other.principal != null) : + !this.principal.equals(other.principal)) { + return false; + } + if ((this.keyTab == null) ? (other.keyTab != null) : !this.keyTab.equals(other.keyTab)) { + return false; + } + return true; + } + + @Override + public int hashCode() { + int hash = 7; + hash = 41 * hash + (this.principal != null ? this.principal.hashCode() : 0); + hash = 41 * hash + (this.keyTab != null ? this.keyTab.hashCode() : 0); + return hash; + } + + @Override + public String toString() { + return "{ principal: " + principal + ", keytab: " + keyTab + " }"; + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializer.java new file mode 100644 index 0000000..ec2b760 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializer.java @@ -0,0 +1,68 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import org.apache.flume.Context; +import org.apache.flume.Event; + +public interface SequenceFileSerializer { + + Class getKeyClass(); + + Class getValueClass(); + + /** + * Format the given event into zero, one or more SequenceFile records + * + * @param e + * event + * @return a list of records corresponding to the given event + */ + Iterable serialize(Event e); + + /** + * Knows how to construct this output formatter.
+ * Note: Implementations MUST provide a public a no-arg constructor. + */ + public interface Builder { + public SequenceFileSerializer build(Context context); + } + + /** + * A key-value pair making up a record in an HDFS SequenceFile + */ + public static class Record { + private final Object key; + private final Object value; + + public Record(Object key, Object value) { + this.key = key; + this.value = value; + } + + public Object getKey() { + return key; + } + + public Object getValue() { + return value; + } + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerFactory.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerFactory.java new file mode 100644 index 0000000..5678836 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerFactory.java @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import com.google.common.base.Preconditions; +import org.apache.flume.Context; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class SequenceFileSerializerFactory { + + private static final Logger logger = + LoggerFactory.getLogger(SequenceFileSerializerFactory.class); + + /** + * {@link Context} prefix + */ + static final String CTX_PREFIX = "writeFormat."; + + @SuppressWarnings("unchecked") + static SequenceFileSerializer getSerializer(String formatType, + Context context) { + + Preconditions.checkNotNull(formatType, + "serialize type must not be null"); + + // try to find builder class in enum of known formatters + SequenceFileSerializerType type; + try { + type = SequenceFileSerializerType.valueOf(formatType); + } catch (IllegalArgumentException e) { + logger.debug("Not in enum, loading builder class: {}", formatType); + type = SequenceFileSerializerType.Other; + } + Class builderClass = + type.getBuilderClass(); + + // handle the case where they have specified their own builder in the config + if (builderClass == null) { + try { + Class c = Class.forName(formatType); + if (c != null && SequenceFileSerializer.Builder.class.isAssignableFrom(c)) { + builderClass = (Class) c; + } else { + logger.error("Unable to instantiate Builder from {}", formatType); + return null; + } + } catch (ClassNotFoundException ex) { + logger.error("Class not found: " + formatType, ex); + return null; + } catch (ClassCastException ex) { + logger.error("Class does not extend " + + SequenceFileSerializer.Builder.class.getCanonicalName() + ": " + + formatType, ex); + return null; + } + } + + // build the builder + SequenceFileSerializer.Builder builder; + try { + builder = builderClass.newInstance(); + } catch (InstantiationException ex) { + logger.error("Cannot instantiate builder: " + formatType, ex); + return null; + } catch (IllegalAccessException ex) { + logger.error("Cannot instantiate builder: " + formatType, ex); + return null; + } + + return builder.build(context); + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerType.java b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerType.java new file mode 100644 index 0000000..2ad7689 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/SequenceFileSerializerType.java @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +public enum SequenceFileSerializerType { + Writable(HDFSWritableSerializer.Builder.class), + Text(HDFSTextSerializer.Builder.class), + Other(null); + + private final Class builderClass; + + SequenceFileSerializerType(Class builderClass) { + this.builderClass = builderClass; + } + + public Class getBuilderClass() { + return builderClass; + } + +} + diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java new file mode 100644 index 0000000..d325233 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSBadDataStream.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import org.apache.flume.Event; + +public class HDFSBadDataStream extends HDFSDataStream { + public class HDFSBadSeqWriter extends HDFSSequenceFile { + @Override + public void append(Event e) throws IOException { + + if (e.getHeaders().containsKey("fault")) { + throw new IOException("Injected fault"); + } else if (e.getHeaders().containsKey("slow")) { + long waitTime = Long.parseLong(e.getHeaders().get("slow")); + try { + Thread.sleep(waitTime); + } catch (InterruptedException eT) { + throw new IOException("append interrupted", eT); + } + } + super.append(e); + } + + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestSeqWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestSeqWriter.java new file mode 100644 index 0000000..f1dadf1 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestSeqWriter.java @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hdfs; + +import org.apache.flume.Event; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; + +import java.io.IOException; + +public class HDFSTestSeqWriter extends HDFSSequenceFile { + protected volatile boolean closed; + protected volatile boolean opened; + + private int openCount = 0; + + HDFSTestSeqWriter(int openCount) { + this.openCount = openCount; + } + + @Override + public void open(String filePath, CompressionCodec codeC, CompressionType compType) + throws IOException { + super.open(filePath, codeC, compType); + if (closed) { + opened = true; + } + } + + @Override + public void append(Event e) throws IOException { + + if (e.getHeaders().containsKey("fault")) { + throw new IOException("Injected fault"); + } else if (e.getHeaders().containsKey("fault-once")) { + e.getHeaders().remove("fault-once"); + throw new IOException("Injected fault"); + } else if (e.getHeaders().containsKey("fault-until-reopen")) { + // opening first time. + if (openCount == 1) { + throw new IOException("Injected fault-until-reopen"); + } + } else if (e.getHeaders().containsKey("slow")) { + long waitTime = Long.parseLong(e.getHeaders().get("slow")); + try { + Thread.sleep(waitTime); + } catch (InterruptedException eT) { + throw new IOException("append interrupted", eT); + } + } + + super.append(e); + } + + @Override + public void close() throws IOException { + closed = true; + super.close(); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestWriterFactory.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestWriterFactory.java new file mode 100644 index 0000000..70bd9e6 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/HDFSTestWriterFactory.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import java.util.concurrent.atomic.AtomicInteger; + +public class HDFSTestWriterFactory extends HDFSWriterFactory { + static final String TestSequenceFileType = "SequenceFile"; + static final String BadDataStreamType = "DataStream"; + + // so we can get a handle to this one in our test. + AtomicInteger openCount = new AtomicInteger(0); + + @Override + public HDFSWriter getWriter(String fileType) throws IOException { + if (fileType == TestSequenceFileType) { + return new HDFSTestSeqWriter(openCount.incrementAndGet()); + } else if (fileType == BadDataStreamType) { + return new HDFSBadDataStream(); + } else { + throw new IOException("File type " + fileType + " not supported"); + } + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockDataStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockDataStream.java new file mode 100644 index 0000000..a85a99f --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockDataStream.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; + +class MockDataStream extends HDFSDataStream { + private final FileSystem fs; + + MockDataStream(FileSystem fs) { + this.fs = fs; + } + + @Override + protected FileSystem getDfs(Configuration conf, Path dstPath) throws IOException { + return fs; + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFileSystem.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFileSystem.java new file mode 100644 index 0000000..a079b83 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFileSystem.java @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import java.net.URI; + +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.permission.FsPermission; +import org.apache.hadoop.util.Progressable; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class MockFileSystem extends FileSystem { + + private static final Logger logger = + LoggerFactory.getLogger(MockFileSystem.class); + + FileSystem fs; + int numberOfRetriesRequired; + MockFsDataOutputStream latestOutputStream; + int currentRenameAttempts; + boolean closeSucceed = true; + + public MockFileSystem(FileSystem fs, int numberOfRetriesRequired) { + this.fs = fs; + this.numberOfRetriesRequired = numberOfRetriesRequired; + } + + public MockFileSystem(FileSystem fs, + int numberOfRetriesRequired, boolean closeSucceed) { + this.fs = fs; + this.numberOfRetriesRequired = numberOfRetriesRequired; + this.closeSucceed = closeSucceed; + } + + @Override + public FSDataOutputStream append(Path arg0, int arg1, Progressable arg2) + throws IOException { + + latestOutputStream = new MockFsDataOutputStream( + fs.append(arg0, arg1, arg2), closeSucceed); + + return latestOutputStream; + } + + @Override + public FSDataOutputStream create(Path arg0) throws IOException { + latestOutputStream = new MockFsDataOutputStream(fs.create(arg0), closeSucceed); + return latestOutputStream; + } + + @Override + public FSDataOutputStream create(Path arg0, FsPermission arg1, boolean arg2, int arg3, + short arg4, long arg5, Progressable arg6) + throws IOException { + throw new IOException("Not a real file system"); + } + + @Override + @Deprecated + public boolean delete(Path arg0) throws IOException { + return fs.delete(arg0); + } + + @Override + public boolean delete(Path arg0, boolean arg1) throws IOException { + return fs.delete(arg0, arg1); + } + + @Override + public FileStatus getFileStatus(Path arg0) throws IOException { + return fs.getFileStatus(arg0); + } + + @Override + public URI getUri() { + return fs.getUri(); + } + + @Override + public Path getWorkingDirectory() { + return fs.getWorkingDirectory(); + } + + @Override + public FileStatus[] listStatus(Path arg0) throws IOException { + return fs.listStatus(arg0); + } + + @Override + public boolean mkdirs(Path arg0, FsPermission arg1) throws IOException { + // TODO Auto-generated method stub + return fs.mkdirs(arg0, arg1); + } + + @Override + public FSDataInputStream open(Path arg0, int arg1) throws IOException { + return fs.open(arg0, arg1); + } + + @Override + public boolean rename(Path arg0, Path arg1) throws IOException { + currentRenameAttempts++; + logger.info("Attempting to Rename: '" + currentRenameAttempts + "' of '" + + numberOfRetriesRequired + "'"); + if (currentRenameAttempts >= numberOfRetriesRequired || numberOfRetriesRequired == 0) { + logger.info("Renaming file"); + return fs.rename(arg0, arg1); + } else { + throw new IOException("MockIOException"); + } + } + + @Override + public void setWorkingDirectory(Path arg0) { + fs.setWorkingDirectory(arg0); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFsDataOutputStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFsDataOutputStream.java new file mode 100644 index 0000000..f5d579c --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockFsDataOutputStream.java @@ -0,0 +1,49 @@ +/** ++ * Licensed to the Apache Software Foundation (ASF) under one ++ * or more contributor license agreements. See the NOTICE file ++ * distributed with this work for additional information ++ * regarding copyright ownership. The ASF licenses this file ++ * to you under the Apache License, Version 2.0 (the ++ * "License"); you may not use this file except in compliance ++ * with the License. You may obtain a copy of the License at ++ * ++ * http://www.apache.org/licenses/LICENSE-2.0 ++ * ++ * Unless required by applicable law or agreed to in writing, software ++ * distributed under the License is distributed on an "AS IS" BASIS, ++ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ++ * See the License for the specific language governing permissions and ++ * limitations under the License. ++ */ +package org.apache.flume.sink.hdfs; + +import org.apache.hadoop.fs.FSDataOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; + +public class MockFsDataOutputStream extends FSDataOutputStream { + + private static final Logger logger = + LoggerFactory.getLogger(MockFsDataOutputStream.class); + + boolean closeSucceed; + + public MockFsDataOutputStream(FSDataOutputStream wrapMe, boolean closeSucceed) + throws IOException { + super(wrapMe.getWrappedStream(), null); + this.closeSucceed = closeSucceed; + } + + @Override + public void close() throws IOException { + logger.info("Close Succeeded - " + closeSucceed); + if (closeSucceed) { + logger.info("closing file"); + super.close(); + } else { + throw new IOException("MockIOException"); + } + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java new file mode 100644 index 0000000..05c4316 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MockHDFSWriter.java @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import java.io.IOException; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; + +public class MockHDFSWriter implements HDFSWriter { + + private int filesOpened = 0; + private int filesClosed = 0; + private int bytesWritten = 0; + private int eventsWritten = 0; + private String filePath = null; + + public int getFilesOpened() { + return filesOpened; + } + + public int getFilesClosed() { + return filesClosed; + } + + public int getBytesWritten() { + return bytesWritten; + } + + public int getEventsWritten() { + return eventsWritten; + } + + public String getOpenedFilePath() { + return filePath; + } + + public void clear() { + filesOpened = 0; + filesClosed = 0; + bytesWritten = 0; + eventsWritten = 0; + } + + public void configure(Context context) { + // no-op + } + + public void open(String filePath) throws IOException { + this.filePath = filePath; + filesOpened++; + } + + public void open(String filePath, CompressionCodec codec, CompressionType cType) + throws IOException { + this.filePath = filePath; + filesOpened++; + } + + public void append(Event e) throws IOException { + eventsWritten++; + bytesWritten += e.getBody().length; + } + + public void sync() throws IOException { + // does nothing + } + + public void close() throws IOException { + filesClosed++; + } + + @Override + public boolean isUnderReplicated() { + return false; + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java new file mode 100644 index 0000000..72164fd --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hdfs; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.LongWritable; + +import java.util.Arrays; + +public class MyCustomSerializer implements SequenceFileSerializer { + + @Override + public Class getKeyClass() { + return LongWritable.class; + } + + @Override + public Class getValueClass() { + return BytesWritable.class; + } + + @Override + public Iterable serialize(Event e) { + return Arrays.asList( + new Record(new LongWritable(1234L), new BytesWritable(new byte[10])), + new Record(new LongWritable(4567L), new BytesWritable(new byte[20])) + ); + } + + public static class Builder implements SequenceFileSerializer.Builder { + + @Override + public SequenceFileSerializer build(Context context) { + return new MyCustomSerializer(); + } + + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestAvroEventSerializer.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestAvroEventSerializer.java new file mode 100644 index 0000000..6b38da2 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestAvroEventSerializer.java @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import com.google.common.base.Charsets; +import com.google.common.io.Files; +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.OutputStream; +import java.util.Arrays; +import org.apache.avro.Schema; +import org.apache.avro.file.DataFileReader; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.io.BinaryEncoder; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.EncoderFactory; +import org.apache.avro.reflect.ReflectDatumWriter; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.EventBuilder; +import org.apache.flume.serialization.AvroEventSerializerConfigurationConstants; +import org.apache.flume.serialization.EventSerializer; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.junit.After; + +public class TestAvroEventSerializer { + + private File file; + + @Before + public void setUp() throws Exception { + file = File.createTempFile(getClass().getSimpleName(), ""); + } + + @After + public void tearDown() throws Exception { + file.delete(); + } + + @Test + public void testNoCompression() throws IOException { + createAvroFile(file, null, false, false); + validateAvroFile(file); + } + + @Test + public void testNullCompression() throws IOException { + createAvroFile(file, "null", false, false); + validateAvroFile(file); + } + + @Test + public void testDeflateCompression() throws IOException { + createAvroFile(file, "deflate", false, false); + validateAvroFile(file); + } + + @Test + public void testSnappyCompression() throws IOException { + createAvroFile(file, "snappy", false, false); + validateAvroFile(file); + } + + @Test + public void testSchemaUrl() throws IOException { + createAvroFile(file, null, true, false); + validateAvroFile(file); + } + + @Test + public void testStaticSchemaUrl() throws IOException { + createAvroFile(file,null,false, true); + validateAvroFile(file); + } + + @Test + public void testBothUrls() throws IOException { + createAvroFile(file,null,true,true); + validateAvroFile(file); + } + + public void createAvroFile(File file, String codec, boolean useSchemaUrl, + boolean useStaticSchemaUrl) throws IOException { + // serialize a few events using the reflection-based avro serializer + OutputStream out = new FileOutputStream(file); + + Context ctx = new Context(); + if (codec != null) { + ctx.put("compressionCodec", codec); + } + + Schema schema = Schema.createRecord("myrecord", null, null, false); + schema.setFields(Arrays.asList(new Schema.Field[]{ + new Schema.Field("message", Schema.create(Schema.Type.STRING), null, null) + })); + GenericRecordBuilder recordBuilder = new GenericRecordBuilder(schema); + File schemaFile = null; + if (useSchemaUrl || useStaticSchemaUrl) { + schemaFile = File.createTempFile(getClass().getSimpleName(), ".avsc"); + Files.write(schema.toString(), schemaFile, Charsets.UTF_8); + } + + if (useStaticSchemaUrl) { + ctx.put(AvroEventSerializerConfigurationConstants.STATIC_SCHEMA_URL, + schemaFile.toURI().toURL().toExternalForm()); + } + + EventSerializer.Builder builder = new AvroEventSerializer.Builder(); + EventSerializer serializer = builder.build(ctx, out); + + serializer.afterCreate(); + for (int i = 0; i < 3; i++) { + GenericRecord record = recordBuilder.set("message", "Hello " + i).build(); + Event event = EventBuilder.withBody(serializeAvro(record, schema)); + if (schemaFile == null && !useSchemaUrl) { + event.getHeaders().put(AvroEventSerializer.AVRO_SCHEMA_LITERAL_HEADER, + schema.toString()); + } else if (useSchemaUrl) { + event.getHeaders().put(AvroEventSerializer.AVRO_SCHEMA_URL_HEADER, + schemaFile.toURI().toURL().toExternalForm()); + } + serializer.write(event); + } + serializer.flush(); + serializer.beforeClose(); + out.flush(); + out.close(); + if (schemaFile != null ) { + schemaFile.delete(); + } + + } + + private byte[] serializeAvro(Object datum, Schema schema) throws IOException { + ByteArrayOutputStream out = new ByteArrayOutputStream(); + ReflectDatumWriter writer = new ReflectDatumWriter(schema); + BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null); + out.reset(); + writer.write(datum, encoder); + encoder.flush(); + return out.toByteArray(); + } + + public void validateAvroFile(File file) throws IOException { + // read the events back using GenericRecord + DatumReader reader = new GenericDatumReader(); + DataFileReader fileReader = + new DataFileReader(file, reader); + GenericRecord record = new GenericData.Record(fileReader.getSchema()); + int numEvents = 0; + while (fileReader.hasNext()) { + fileReader.next(record); + String bodyStr = record.get("message").toString(); + System.out.println(bodyStr); + numEvents++; + } + fileReader.close(); + Assert.assertEquals("Should have found a total of 3 events", 3, numEvents); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java new file mode 100644 index 0000000..742deb0 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestBucketWriter.java @@ -0,0 +1,450 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import com.google.common.base.Charsets; +import org.apache.flume.Clock; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.auth.FlumeAuthenticationUtil; +import org.apache.flume.auth.PrivilegedExecutor; +import org.apache.flume.event.EventBuilder; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.io.IOException; +import java.util.Calendar; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +public class TestBucketWriter { + + private static Logger logger = LoggerFactory.getLogger(TestBucketWriter.class); + private Context ctx = new Context(); + + private static ScheduledExecutorService timedRollerPool; + private static PrivilegedExecutor proxy; + + @BeforeClass + public static void setup() { + timedRollerPool = Executors.newSingleThreadScheduledExecutor(); + proxy = FlumeAuthenticationUtil.getAuthenticator(null, null).proxyAs(null); + } + + @AfterClass + public static void teardown() throws InterruptedException { + timedRollerPool.shutdown(); + timedRollerPool.awaitTermination(2, TimeUnit.SECONDS); + timedRollerPool.shutdownNow(); + } + + @Test + public void testEventCountingRoller() throws IOException, InterruptedException { + int maxEvents = 100; + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + 0, 0, maxEvents, 0, ctx, "/tmp", "file", "", ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + for (int i = 0; i < 1000; i++) { + bucketWriter.append(e); + } + + logger.info("Number of events written: {}", hdfsWriter.getEventsWritten()); + logger.info("Number of bytes written: {}", hdfsWriter.getBytesWritten()); + logger.info("Number of files opened: {}", hdfsWriter.getFilesOpened()); + + Assert.assertEquals("events written", 1000, hdfsWriter.getEventsWritten()); + Assert.assertEquals("bytes written", 3000, hdfsWriter.getBytesWritten()); + Assert.assertEquals("files opened", 10, hdfsWriter.getFilesOpened()); + } + + @Test + public void testSizeRoller() throws IOException, InterruptedException { + int maxBytes = 300; + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + 0, maxBytes, 0, 0, ctx, "/tmp", "file", "", ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + for (int i = 0; i < 1000; i++) { + bucketWriter.append(e); + } + + logger.info("Number of events written: {}", hdfsWriter.getEventsWritten()); + logger.info("Number of bytes written: {}", hdfsWriter.getBytesWritten()); + logger.info("Number of files opened: {}", hdfsWriter.getFilesOpened()); + + Assert.assertEquals("events written", 1000, hdfsWriter.getEventsWritten()); + Assert.assertEquals("bytes written", 3000, hdfsWriter.getBytesWritten()); + Assert.assertEquals("files opened", 10, hdfsWriter.getFilesOpened()); + } + + @Test + public void testIntervalRoller() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1; // seconds + final int NUM_EVENTS = 10; + final AtomicBoolean calledBack = new AtomicBoolean(false); + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, + new HDFSEventSink.WriterCallback() { + @Override + public void run(String filePath) { + calledBack.set(true); + } + }, null, 30000, Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + long startNanos = System.nanoTime(); + for (int i = 0; i < NUM_EVENTS - 1; i++) { + bucketWriter.append(e); + } + + // sleep to force a roll... wait 2x interval just to be sure + Thread.sleep(2 * ROLL_INTERVAL * 1000L); + + Assert.assertTrue(bucketWriter.closed); + Assert.assertTrue(calledBack.get()); + + bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + // write one more event (to reopen a new file so we will roll again later) + bucketWriter.append(e); + + long elapsedMillis = TimeUnit.MILLISECONDS.convert( + System.nanoTime() - startNanos, TimeUnit.NANOSECONDS); + long elapsedSeconds = elapsedMillis / 1000L; + + logger.info("Time elapsed: {} milliseconds", elapsedMillis); + logger.info("Number of events written: {}", hdfsWriter.getEventsWritten()); + logger.info("Number of bytes written: {}", hdfsWriter.getBytesWritten()); + logger.info("Number of files opened: {}", hdfsWriter.getFilesOpened()); + logger.info("Number of files closed: {}", hdfsWriter.getFilesClosed()); + + Assert.assertEquals("events written", NUM_EVENTS, + hdfsWriter.getEventsWritten()); + Assert.assertEquals("bytes written", e.getBody().length * NUM_EVENTS, + hdfsWriter.getBytesWritten()); + Assert.assertEquals("files opened", 2, hdfsWriter.getFilesOpened()); + + // before auto-roll + Assert.assertEquals("files closed", 1, hdfsWriter.getFilesClosed()); + + logger.info("Waiting for roll..."); + Thread.sleep(2 * ROLL_INTERVAL * 1000L); + + logger.info("Number of files closed: {}", hdfsWriter.getFilesClosed()); + Assert.assertEquals("files closed", 2, hdfsWriter.getFilesClosed()); + } + + @Test + public void testIntervalRollerBug() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1; // seconds + final int NUM_EVENTS = 10; + + HDFSWriter hdfsWriter = new HDFSWriter() { + private volatile boolean open = false; + + public void configure(Context context) { + } + + public void sync() throws IOException { + if (!open) { + throw new IOException("closed"); + } + } + + public void open(String filePath, CompressionCodec codec, CompressionType cType) + throws IOException { + open = true; + } + + public void open(String filePath) throws IOException { + open = true; + } + + public void close() throws IOException { + open = false; + } + + @Override + public boolean isUnderReplicated() { + return false; + } + + public void append(Event e) throws IOException { + // we just re-open in append if closed + open = true; + } + }; + + HDFSTextSerializer serializer = new HDFSTextSerializer(); + File tmpFile = File.createTempFile("flume", "test"); + tmpFile.deleteOnExit(); + String path = tmpFile.getParent(); + String name = tmpFile.getName(); + + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, path, name, "", ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + for (int i = 0; i < NUM_EVENTS - 1; i++) { + bucketWriter.append(e); + } + + // sleep to force a roll... wait 2x interval just to be sure + Thread.sleep(2 * ROLL_INTERVAL * 1000L); + + bucketWriter.flush(); // throws closed exception + } + + @Test + public void testFileSuffixNotGiven() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String suffix = null; + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", ".tmp", suffix, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + // Need to override system time use for test so we know what to expect + final long testTime = System.currentTimeMillis(); + Clock testClock = new Clock() { + public long currentTimeMillis() { + return testTime; + } + }; + bucketWriter.setClock(testClock); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + + Assert.assertTrue("Incorrect suffix", hdfsWriter.getOpenedFilePath().endsWith( + Long.toString(testTime + 1) + ".tmp")); + } + + @Test + public void testFileSuffixGiven() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String suffix = ".avro"; + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", ".tmp", suffix, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + // Need to override system time use for test so we know what to expect + + final long testTime = System.currentTimeMillis(); + + Clock testClock = new Clock() { + public long currentTimeMillis() { + return testTime; + } + }; + bucketWriter.setClock(testClock); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + + Assert.assertTrue("Incorrect suffix",hdfsWriter.getOpenedFilePath().endsWith( + Long.toString(testTime + 1) + suffix + ".tmp")); + } + + @Test + public void testFileSuffixCompressed() + throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String suffix = ".foo"; + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", ".tmp", suffix, + HDFSEventSink.getCodec("gzip"), SequenceFile.CompressionType.BLOCK, hdfsWriter, + timedRollerPool, proxy, new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), + 0, null, null, 30000, Executors.newSingleThreadExecutor(), 0, 0 + ); + + // Need to override system time use for test so we know what to expect + final long testTime = System.currentTimeMillis(); + + Clock testClock = new Clock() { + public long currentTimeMillis() { + return testTime; + } + }; + bucketWriter.setClock(testClock); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + + Assert.assertTrue("Incorrect suffix", hdfsWriter.getOpenedFilePath().endsWith( + Long.toString(testTime + 1) + suffix + ".tmp")); + } + + @Test + public void testInUsePrefix() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String PREFIX = "BRNO_IS_CITY_IN_CZECH_REPUBLIC"; + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + HDFSTextSerializer formatter = new HDFSTextSerializer(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", PREFIX, ".tmp", null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + + Assert.assertTrue("Incorrect in use prefix", hdfsWriter.getOpenedFilePath().contains(PREFIX)); + } + + @Test + public void testInUseSuffix() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String SUFFIX = "WELCOME_TO_THE_HELLMOUNTH"; + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + HDFSTextSerializer serializer = new HDFSTextSerializer(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", SUFFIX, null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + + Assert.assertTrue("Incorrect in use suffix", hdfsWriter.getOpenedFilePath().contains(SUFFIX)); + } + + @Test + public void testCallbackOnClose() throws IOException, InterruptedException { + final int ROLL_INTERVAL = 1000; // seconds. Make sure it doesn't change in course of test + final String SUFFIX = "WELCOME_TO_THE_EREBOR"; + final AtomicBoolean callbackCalled = new AtomicBoolean(false); + + MockHDFSWriter hdfsWriter = new MockHDFSWriter(); + BucketWriter bucketWriter = new BucketWriter( + ROLL_INTERVAL, 0, 0, 0, ctx, "/tmp", "file", "", SUFFIX, null, null, + SequenceFile.CompressionType.NONE, hdfsWriter, timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, + new HDFSEventSink.WriterCallback() { + @Override + public void run(String filePath) { + callbackCalled.set(true); + } + }, "blah", 30000, Executors.newSingleThreadExecutor(), 0, 0); + + Event e = EventBuilder.withBody("foo", Charsets.UTF_8); + bucketWriter.append(e); + bucketWriter.close(true); + + Assert.assertTrue(callbackCalled.get()); + } + + + + @Test + public void testSequenceFileRenameRetries() throws Exception { + SequenceFileRenameRetryCoreTest(1, true); + SequenceFileRenameRetryCoreTest(5, true); + SequenceFileRenameRetryCoreTest(2, true); + + SequenceFileRenameRetryCoreTest(1, false); + SequenceFileRenameRetryCoreTest(5, false); + SequenceFileRenameRetryCoreTest(2, false); + } + + public void SequenceFileRenameRetryCoreTest(int numberOfRetriesRequired, boolean closeSucceed) + throws Exception { + String hdfsPath = "file:///tmp/flume-test." + + Calendar.getInstance().getTimeInMillis() + + "." + Thread.currentThread().getId(); + + Context context = new Context(); + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(hdfsPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + context.put("hdfs.path", hdfsPath); + context.put("hdfs.closeTries", String.valueOf(numberOfRetriesRequired)); + context.put("hdfs.rollCount", "1"); + context.put("hdfs.retryInterval", "1"); + context.put("hdfs.callTimeout", Long.toString(1000)); + MockFileSystem mockFs = new MockFileSystem(fs, numberOfRetriesRequired, closeSucceed); + BucketWriter bucketWriter = new BucketWriter( + 0, 0, 1, 1, ctx, hdfsPath, hdfsPath, "singleBucket", ".tmp", null, null, + null, new MockDataStream(mockFs), timedRollerPool, proxy, + new SinkCounter("test-bucket-writer-" + System.currentTimeMillis()), 0, null, null, 30000, + Executors.newSingleThreadExecutor(), 1, numberOfRetriesRequired); + + bucketWriter.setFileSystem(mockFs); + // At this point, we checked if isFileClosed is available in + // this JVM, so lets make it check again. + Event event = EventBuilder.withBody("test", Charsets.UTF_8); + bucketWriter.append(event); + // This is what triggers the close, so a 2nd append is required :/ + bucketWriter.append(event); + + TimeUnit.SECONDS.sleep(numberOfRetriesRequired + 2); + + Assert.assertTrue("Expected " + numberOfRetriesRequired + " " + + "but got " + bucketWriter.renameTries.get(), + bucketWriter.renameTries.get() == numberOfRetriesRequired); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java new file mode 100644 index 0000000..80f199b --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSCompressedDataStream.java @@ -0,0 +1,141 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import java.io.File; +import java.io.FileInputStream; +import java.nio.ByteBuffer; +import java.nio.charset.CharsetDecoder; +import java.util.List; +import java.util.zip.GZIPInputStream; + +import org.apache.avro.file.DataFileStream; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.EventBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.SequenceFile; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Charsets; +import com.google.common.collect.Lists; + +public class TestHDFSCompressedDataStream { + + private static final Logger logger = + LoggerFactory.getLogger(TestHDFSCompressedDataStream.class); + + private File file; + private String fileURI; + private CompressionCodecFactory factory; + + @Before + public void init() throws Exception { + this.file = new File("target/test/data/foo.gz"); + this.fileURI = file.getAbsoluteFile().toURI().toString(); + logger.info("File URI: {}", fileURI); + + Configuration conf = new Configuration(); + // local FS must be raw in order to be Syncable + conf.set("fs.file.impl", "org.apache.hadoop.fs.RawLocalFileSystem"); + Path path = new Path(fileURI); + path.getFileSystem(conf); // get FS with our conf cached + + this.factory = new CompressionCodecFactory(conf); + } + + // make sure the data makes it to disk if we sync() the data stream + @Test + public void testGzipDurability() throws Exception { + Context context = new Context(); + HDFSCompressedDataStream writer = new HDFSCompressedDataStream(); + writer.configure(context); + writer.open(fileURI, factory.getCodec(new Path(fileURI)), + SequenceFile.CompressionType.BLOCK); + + String[] bodies = { "yarf!" }; + writeBodies(writer, bodies); + + byte[] buf = new byte[256]; + GZIPInputStream cmpIn = new GZIPInputStream(new FileInputStream(file)); + int len = cmpIn.read(buf); + String result = new String(buf, 0, len, Charsets.UTF_8); + result = result.trim(); // BodyTextEventSerializer adds a newline + + Assert.assertEquals("input and output must match", bodies[0], result); + } + + @Test + public void testGzipDurabilityWithSerializer() throws Exception { + Context context = new Context(); + context.put("serializer", "AVRO_EVENT"); + + HDFSCompressedDataStream writer = new HDFSCompressedDataStream(); + writer.configure(context); + + writer.open(fileURI, factory.getCodec(new Path(fileURI)), + SequenceFile.CompressionType.BLOCK); + + String[] bodies = { "yarf!", "yarfing!" }; + writeBodies(writer, bodies); + + int found = 0; + int expected = bodies.length; + List expectedBodies = Lists.newArrayList(bodies); + + GZIPInputStream cmpIn = new GZIPInputStream(new FileInputStream(file)); + DatumReader reader = new GenericDatumReader(); + DataFileStream avroStream = + new DataFileStream(cmpIn, reader); + GenericRecord record = new GenericData.Record(avroStream.getSchema()); + while (avroStream.hasNext()) { + avroStream.next(record); + CharsetDecoder decoder = Charsets.UTF_8.newDecoder(); + String bodyStr = decoder.decode((ByteBuffer) record.get("body")) + .toString(); + expectedBodies.remove(bodyStr); + found++; + } + avroStream.close(); + cmpIn.close(); + + Assert.assertTrue("Found = " + found + ", Expected = " + expected + + ", Left = " + expectedBodies.size() + " " + expectedBodies, + expectedBodies.size() == 0); + } + + private void writeBodies(HDFSCompressedDataStream writer, String... bodies) + throws Exception { + for (String body : bodies) { + Event evt = EventBuilder.withBody(body, Charsets.UTF_8); + writer.append(evt); + } + writer.sync(); + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java new file mode 100644 index 0000000..782cf47 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java @@ -0,0 +1,1548 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.hdfs; + +import java.io.BufferedReader; +import java.io.File; +import java.io.IOException; +import java.io.InputStreamReader; +import java.nio.ByteBuffer; +import java.nio.charset.CharsetDecoder; +import java.util.Arrays; +import java.util.Calendar; +import java.util.Collection; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.Maps; +import org.apache.avro.file.DataFileStream; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.io.DatumReader; +import org.apache.commons.lang.StringUtils; +import org.apache.flume.Channel; +import org.apache.flume.Clock; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Sink.Status; +import org.apache.flume.SystemClock; +import org.apache.flume.Transaction; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.conf.Configurables; +import org.apache.flume.event.EventBuilder; +import org.apache.flume.event.SimpleEvent; +import org.apache.flume.lifecycle.LifecycleException; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.CommonConfigurationKeys; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.FileUtil; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.SequenceFile; +import org.apache.hadoop.security.UserGroupInformation; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Charsets; +import com.google.common.collect.Lists; + +public class TestHDFSEventSink { + + private HDFSEventSink sink; + private String testPath; + private static final Logger LOG = LoggerFactory + .getLogger(HDFSEventSink.class); + + static { + System.setProperty("java.security.krb5.realm", "flume"); + System.setProperty("java.security.krb5.kdc", "blah"); + } + + private void dirCleanup() { + Configuration conf = new Configuration(); + try { + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(testPath); + if (fs.exists(dirPath)) { + fs.delete(dirPath, true); + } + } catch (IOException eIO) { + LOG.warn("IO Error in test cleanup", eIO); + } + } + + // TODO: use System.getProperty("file.separator") instead of hardcoded '/' + @Before + public void setUp() { + LOG.debug("Starting..."); + /* + * FIXME: Use a dynamic path to support concurrent test execution. Also, + * beware of the case where this path is used for something or when the + * Hadoop config points at file:/// rather than hdfs://. We need to find a + * better way of testing HDFS related functionality. + */ + testPath = "file:///tmp/flume-test." + + Calendar.getInstance().getTimeInMillis() + "." + + Thread.currentThread().getId(); + + sink = new HDFSEventSink(); + sink.setName("HDFSEventSink-" + UUID.randomUUID().toString()); + dirCleanup(); + } + + @After + public void tearDown() { + if (System.getenv("hdfs_keepFiles") == null) dirCleanup(); + } + + @Test + public void testTextBatchAppend() throws Exception { + doTestTextBatchAppend(false); + } + + @Test + public void testTextBatchAppendRawFS() throws Exception { + doTestTextBatchAppend(true); + } + + public void doTestTextBatchAppend(boolean useRawLocalFileSystem) + throws Exception { + LOG.debug("Starting..."); + + final long rollCount = 10; + final long batchSize = 2; + final String fileName = "FlumeData"; + String newPath = testPath + "/singleTextBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + // context.put("hdfs.path", testPath + "/%Y-%m-%d/%H"); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.rollInterval", "0"); + context.put("hdfs.rollSize", "0"); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.writeFormat", "Text"); + context.put("hdfs.useRawLocalFileSystem", + Boolean.toString(useRawLocalFileSystem)); + context.put("hdfs.fileType", "DataStream"); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push the event batches into channel to roll twice + for (i = 1; i <= (rollCount * 10) / batchSize; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + // check the contents of the all files + verifyOutputTextFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + @Test + public void testLifecycle() throws InterruptedException, LifecycleException { + LOG.debug("Starting..."); + Context context = new Context(); + + context.put("hdfs.path", testPath); + /* + * context.put("hdfs.rollInterval", String.class); + * context.get("hdfs.rollSize", String.class); context.get("hdfs.rollCount", + * String.class); + */ + Configurables.configure(sink, context); + + sink.setChannel(new MemoryChannel()); + + sink.start(); + sink.stop(); + } + + @Test + public void testEmptyChannelResultsInStatusBackoff() + throws InterruptedException, LifecycleException, EventDeliveryException { + LOG.debug("Starting..."); + Context context = new Context(); + Channel channel = new MemoryChannel(); + context.put("hdfs.path", testPath); + context.put("keep-alive", "0"); + Configurables.configure(sink, context); + Configurables.configure(channel, context); + sink.setChannel(channel); + sink.start(); + Assert.assertEquals(Status.BACKOFF, sink.process()); + sink.stop(); + } + + @Test + public void testKerbFileAccess() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + LOG.debug("Starting testKerbFileAccess() ..."); + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + String newPath = testPath + "/singleBucket"; + String kerbConfPrincipal = "user1/localhost@EXAMPLE.COM"; + String kerbKeytab = "/usr/lib/flume/nonexistkeytabfile"; + + //turn security on + Configuration conf = new Configuration(); + conf.set(CommonConfigurationKeys.HADOOP_SECURITY_AUTHENTICATION, + "kerberos"); + UserGroupInformation.setConfiguration(conf); + + Context context = new Context(); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.kerberosPrincipal", kerbConfPrincipal); + context.put("hdfs.kerberosKeytab", kerbKeytab); + + try { + Configurables.configure(sink, context); + Assert.fail("no exception thrown"); + } catch (IllegalArgumentException expected) { + Assert.assertTrue(expected.getMessage().contains( + "Keytab is not a readable file")); + } finally { + //turn security off + conf.set(CommonConfigurationKeys.HADOOP_SECURITY_AUTHENTICATION, + "simple"); + UserGroupInformation.setConfiguration(conf); + } + } + + @Test + public void testTextAppend() throws InterruptedException, LifecycleException, + EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final long rollCount = 3; + final long batchSize = 2; + final String fileName = "FlumeData"; + String newPath = testPath + "/singleTextBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + // context.put("hdfs.path", testPath + "/%Y-%m-%d/%H"); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.writeFormat", "Text"); + context.put("hdfs.fileType", "DataStream"); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push the event batches into channel + for (i = 1; i < 4; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + verifyOutputTextFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + @Test + public void testAvroAppend() throws InterruptedException, LifecycleException, + EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final long rollCount = 3; + final long batchSize = 2; + final String fileName = "FlumeData"; + String newPath = testPath + "/singleTextBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + // context.put("hdfs.path", testPath + "/%Y-%m-%d/%H"); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.writeFormat", "Text"); + context.put("hdfs.fileType", "DataStream"); + context.put("serializer", "AVRO_EVENT"); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push the event batches into channel + for (i = 1; i < 4; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + verifyOutputAvroFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + @Test + public void testSimpleAppend() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + final int numBatches = 4; + String newPath = testPath + "/singleBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push the event batches into channel + for (i = 1; i < numBatches; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + @Test + public void testSimpleAppendLocalTime() + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + final long currentTime = System.currentTimeMillis(); + Clock clk = new Clock() { + @Override + public long currentTimeMillis() { + return currentTime; + } + }; + + LOG.debug("Starting..."); + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + final int numBatches = 4; + String newPath = testPath + "/singleBucket/%s" ; + String expectedPath = testPath + "/singleBucket/" + + String.valueOf(currentTime / 1000); + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(expectedPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.useLocalTimeStamp", String.valueOf(true)); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.setBucketClock(clk); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push the event batches into channel + for (i = 1; i < numBatches; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + // The clock in bucketpath is static, so restore the real clock + sink.setBucketClock(new SystemClock()); + } + + @Test + public void testAppend() throws InterruptedException, LifecycleException, + EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final long rollCount = 3; + final long batchSize = 2; + final String fileName = "FlumeData"; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(testPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", testPath + "/%Y-%m-%d/%H"); + context.put("hdfs.timeZone", "UTC"); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + // push the event batches into channel + for (int i = 1; i < 4; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + // inject fault and make sure that the txn is rolled back and retried + @Test + public void testBadSimpleAppend() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + final int numBatches = 4; + String newPath = testPath + "/singleBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + + List bodies = Lists.newArrayList(); + // push the event batches into channel + for (i = 1; i < numBatches; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + // inject fault + if ((totalEvents % 30) == 1) { + event.getHeaders().put("fault-once", ""); + } + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + LOG.info("Process events: " + sink.process()); + } + LOG.info("Process events to end of transaction max: " + sink.process()); + LOG.info("Process events to injected fault: " + sink.process()); + LOG.info("Process events remaining events: " + sink.process()); + sink.stop(); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + + } + + + private List getAllFiles(String input) { + List output = Lists.newArrayList(); + File dir = new File(input); + if (dir.isFile()) { + output.add(dir.getAbsolutePath()); + } else if (dir.isDirectory()) { + for (String file : dir.list()) { + File subDir = new File(dir, file); + output.addAll(getAllFiles(subDir.getAbsolutePath())); + } + } + return output; + } + + private void verifyOutputSequenceFiles(FileSystem fs, Configuration conf, String dir, + String prefix, List bodies) throws IOException { + int found = 0; + int expected = bodies.size(); + for (String outputFile : getAllFiles(dir)) { + String name = (new File(outputFile)).getName(); + if (name.startsWith(prefix)) { + SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(outputFile), conf); + LongWritable key = new LongWritable(); + BytesWritable value = new BytesWritable(); + while (reader.next(key, value)) { + String body = new String(value.getBytes(), 0, value.getLength()); + if (bodies.contains(body)) { + LOG.debug("Found event body: {}", body); + bodies.remove(body); + found++; + } + } + reader.close(); + } + } + if (!bodies.isEmpty()) { + for (String body : bodies) { + LOG.error("Never found event body: {}", body); + } + } + Assert.assertTrue("Found = " + found + ", Expected = " + + expected + ", Left = " + bodies.size() + " " + bodies, + bodies.size() == 0); + + } + + private void verifyOutputTextFiles(FileSystem fs, Configuration conf, String dir, String prefix, + List bodies) throws IOException { + int found = 0; + int expected = bodies.size(); + for (String outputFile : getAllFiles(dir)) { + String name = (new File(outputFile)).getName(); + if (name.startsWith(prefix)) { + FSDataInputStream input = fs.open(new Path(outputFile)); + BufferedReader reader = new BufferedReader(new InputStreamReader(input)); + String body = null; + while ((body = reader.readLine()) != null) { + bodies.remove(body); + found++; + } + reader.close(); + } + } + Assert.assertTrue("Found = " + found + ", Expected = " + + expected + ", Left = " + bodies.size() + " " + bodies, + bodies.size() == 0); + + } + + private void verifyOutputAvroFiles(FileSystem fs, Configuration conf, String dir, String prefix, + List bodies) throws IOException { + int found = 0; + int expected = bodies.size(); + for (String outputFile : getAllFiles(dir)) { + String name = (new File(outputFile)).getName(); + if (name.startsWith(prefix)) { + FSDataInputStream input = fs.open(new Path(outputFile)); + DatumReader reader = new GenericDatumReader(); + DataFileStream avroStream = + new DataFileStream(input, reader); + GenericRecord record = new GenericData.Record(avroStream.getSchema()); + while (avroStream.hasNext()) { + avroStream.next(record); + ByteBuffer body = (ByteBuffer) record.get("body"); + CharsetDecoder decoder = Charsets.UTF_8.newDecoder(); + String bodyStr = decoder.decode(body).toString(); + LOG.debug("Removing event: {}", bodyStr); + bodies.remove(bodyStr); + found++; + } + avroStream.close(); + input.close(); + } + } + Assert.assertTrue("Found = " + found + ", Expected = " + + expected + ", Left = " + bodies.size() + " " + bodies, + bodies.size() == 0); + } + + /** + * Ensure that when a write throws an IOException we are + * able to continue to progress in the next process() call. + * This relies on Transactional rollback semantics for durability and + * the behavior of the BucketWriter class of close()ing upon IOException. + */ + @Test + public void testCloseReopen() + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final int numBatches = 4; + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + String newPath = testPath + "/singleBucket"; + int i = 1, j = 1; + + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + + Configurables.configure(sink, context); + + MemoryChannel channel = new MemoryChannel(); + Configurables.configure(channel, new Context()); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + // push the event batches into channel + for (i = 1; i < numBatches; i++) { + channel.getTransaction().begin(); + try { + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + // inject fault + event.getHeaders().put("fault-until-reopen", ""); + channel.put(event); + } + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + LOG.info("execute sink to process the events: " + sink.process()); + } + LOG.info("clear any events pending due to errors: " + sink.process()); + sink.stop(); + + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + /** + * Test that the old bucket writer is closed at the end of rollInterval and + * a new one is used for the next set of events. + */ + @Test + public void testCloseReopenOnRollTime() + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final int numBatches = 4; + final String fileName = "FlumeData"; + final long batchSize = 2; + String newPath = testPath + "/singleBucket"; + int i = 1, j = 1; + + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(0)); + context.put("hdfs.rollSize", String.valueOf(0)); + context.put("hdfs.rollInterval", String.valueOf(2)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + + Configurables.configure(sink, context); + + MemoryChannel channel = new MemoryChannel(); + Configurables.configure(channel, new Context()); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + // push the event batches into channel + for (i = 1; i < numBatches; i++) { + channel.getTransaction().begin(); + try { + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + // inject fault + event.getHeaders().put("count-check", ""); + channel.put(event); + } + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + LOG.info("execute sink to process the events: " + sink.process()); + // Make sure the first file gets rolled due to rollTimeout. + if (i == 1) { + Thread.sleep(2001); + } + } + LOG.info("clear any events pending due to errors: " + sink.process()); + sink.stop(); + + Assert.assertTrue(badWriterFactory.openCount.get() >= 2); + LOG.info("Total number of bucket writers opened: {}", + badWriterFactory.openCount.get()); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, + bodies); + } + + /** + * Test that a close due to roll interval removes the bucketwriter from + * sfWriters map. + */ + @Test + public void testCloseRemovesFromSFWriters() + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final String fileName = "FlumeData"; + final long batchSize = 2; + String newPath = testPath + "/singleBucket"; + int i = 1, j = 1; + + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(0)); + context.put("hdfs.rollSize", String.valueOf(0)); + context.put("hdfs.rollInterval", String.valueOf(1)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + String expectedLookupPath = newPath + "/FlumeData"; + + Configurables.configure(sink, context); + + MemoryChannel channel = new MemoryChannel(); + Configurables.configure(channel, new Context()); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + // push the event batches into channel + channel.getTransaction().begin(); + try { + for (j = 1; j <= 2 * batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + // inject fault + event.getHeaders().put("count-check", ""); + channel.put(event); + } + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + LOG.info("execute sink to process the events: " + sink.process()); + Assert.assertTrue(sink.getSfWriters().containsKey(expectedLookupPath)); + // Make sure the first file gets rolled due to rollTimeout. + Thread.sleep(2001); + Assert.assertFalse(sink.getSfWriters().containsKey(expectedLookupPath)); + LOG.info("execute sink to process the events: " + sink.process()); + // A new bucket writer should have been created for this bucket. So + // sfWriters map should not have the same key again. + Assert.assertTrue(sink.getSfWriters().containsKey(expectedLookupPath)); + sink.stop(); + + LOG.info("Total number of bucket writers opened: {}", + badWriterFactory.openCount.get()); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, + bodies); + } + + + + /* + * append using slow sink writer. + * verify that the process returns backoff due to timeout + */ + @Test + public void testSlowAppendFailure() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + + LOG.debug("Starting..."); + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + final int numBatches = 2; + String newPath = testPath + "/singleBucket"; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + // create HDFS sink with slow writer + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + Context context = new Context(); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + context.put("hdfs.callTimeout", Long.toString(1000)); + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + + // push the event batches into channel + for (i = 0; i < numBatches; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + event.getHeaders().put("slow", "1500"); + event.setBody(("Test." + i + "." + j).getBytes()); + channel.put(event); + } + txn.commit(); + txn.close(); + + // execute sink to process the events + Status satus = sink.process(); + + // verify that the append returned backoff due to timeotu + Assert.assertEquals(satus, Status.BACKOFF); + } + + sink.stop(); + } + + /* + * append using slow sink writer with specified append timeout + * verify that the data is written correctly to files + */ + private void slowAppendTestHelper(long appendTimeout) + throws InterruptedException, IOException, LifecycleException, EventDeliveryException, + IOException { + final String fileName = "FlumeData"; + final long rollCount = 5; + final long batchSize = 2; + final int numBatches = 2; + String newPath = testPath + "/singleBucket"; + int totalEvents = 0; + int i = 1, j = 1; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + // create HDFS sink with slow writer + HDFSTestWriterFactory badWriterFactory = new HDFSTestWriterFactory(); + sink = new HDFSEventSink(badWriterFactory); + + Context context = new Context(); + context.put("hdfs.path", newPath); + context.put("hdfs.filePrefix", fileName); + context.put("hdfs.rollCount", String.valueOf(rollCount)); + context.put("hdfs.batchSize", String.valueOf(batchSize)); + context.put("hdfs.fileType", HDFSTestWriterFactory.TestSequenceFileType); + context.put("hdfs.appendTimeout", String.valueOf(appendTimeout)); + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + // push the event batches into channel + for (i = 0; i < numBatches; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + eventDate.clear(); + eventDate.set(2011, i, i, i, 0); // yy mm dd + event.getHeaders().put("timestamp", + String.valueOf(eventDate.getTimeInMillis())); + event.getHeaders().put("hostname", "Host" + i); + event.getHeaders().put("slow", "1500"); + String body = "Test." + i + "." + j; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + totalEvents++; + } + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + + sink.stop(); + + // loop through all the files generated and check their contains + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + + // check that the roll happened correctly for the given data + // Note that we'll end up with two files with only a head + long expectedFiles = totalEvents / rollCount; + if (totalEvents % rollCount > 0) expectedFiles++; + Assert.assertEquals("num files wrong, found: " + + Lists.newArrayList(fList), expectedFiles, fList.length); + verifyOutputSequenceFiles(fs, conf, dirPath.toUri().getPath(), fileName, bodies); + } + + /* + * append using slow sink writer with long append timeout + * verify that the data is written correctly to files + */ + @Test + public void testSlowAppendWithLongTimeout() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + LOG.debug("Starting..."); + slowAppendTestHelper(3000); + } + + /* + * append using slow sink writer with no timeout to make append + * synchronous. Verify that the data is written correctly to files + */ + @Test + public void testSlowAppendWithoutTimeout() throws InterruptedException, + LifecycleException, EventDeliveryException, IOException { + LOG.debug("Starting..."); + slowAppendTestHelper(0); + } + @Test + public void testCloseOnIdle() throws IOException, EventDeliveryException, InterruptedException { + String hdfsPath = testPath + "/idleClose"; + + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(hdfsPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + Context context = new Context(); + context.put("hdfs.path", hdfsPath); + /* + * All three rolling methods are disabled so the only + * way a file can roll is through the idle timeout. + */ + context.put("hdfs.rollCount", "0"); + context.put("hdfs.rollSize", "0"); + context.put("hdfs.rollInterval", "0"); + context.put("hdfs.batchSize", "2"); + context.put("hdfs.idleTimeout", "1"); + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int i = 0; i < 10; i++) { + Event event = new SimpleEvent(); + event.setBody(("test event " + i).getBytes()); + channel.put(event); + } + txn.commit(); + txn.close(); + + sink.process(); + sink.process(); + Thread.sleep(1001); + // previous file should have timed out now + // this can throw BucketClosedException(from the bucketWriter having + // closed),this is not an issue as the sink will retry and get a fresh + // bucketWriter so long as the onClose handler properly removes + // bucket writers that were closed. + sink.process(); + sink.process(); + Thread.sleep(500); // shouldn't be enough for a timeout to occur + sink.process(); + sink.process(); + sink.stop(); + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] fList = FileUtil.stat2Paths(dirStat); + Assert.assertEquals("Incorrect content of the directory " + StringUtils.join(fList, ","), + 2, fList.length); + Assert.assertTrue(!fList[0].getName().endsWith(".tmp") && + !fList[1].getName().endsWith(".tmp")); + fs.close(); + } + + /** + * This test simulates what happens when a batch of events is written to a compressed sequence + * file (and thus hsync'd to hdfs) but the file is not yet closed. + * + * When this happens, the data that we wrote should still be readable. + */ + @Test + public void testBlockCompressSequenceFileWriterSync() throws IOException, EventDeliveryException { + String hdfsPath = testPath + "/sequenceFileWriterSync"; + FileSystem fs = FileSystem.get(new Configuration()); + // Since we are reading a partial file we don't want to use checksums + fs.setVerifyChecksum(false); + fs.setWriteChecksum(false); + + // Compression codecs that don't require native hadoop libraries + String [] codecs = {"BZip2Codec", "DeflateCodec"}; + + for (String codec : codecs) { + sequenceFileWriteAndVerifyEvents(fs, hdfsPath, codec, Collections.singletonList( + "single-event" + )); + + sequenceFileWriteAndVerifyEvents(fs, hdfsPath, codec, Arrays.asList( + "multiple-events-1", + "multiple-events-2", + "multiple-events-3", + "multiple-events-4", + "multiple-events-5" + )); + } + + fs.close(); + } + + private void sequenceFileWriteAndVerifyEvents(FileSystem fs, String hdfsPath, String codec, + Collection eventBodies) + throws IOException, EventDeliveryException { + Path dirPath = new Path(hdfsPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + + Context context = new Context(); + context.put("hdfs.path", hdfsPath); + // Ensure the file isn't closed and rolled + context.put("hdfs.rollCount", String.valueOf(eventBodies.size() + 1)); + context.put("hdfs.rollSize", "0"); + context.put("hdfs.rollInterval", "0"); + context.put("hdfs.batchSize", "1"); + context.put("hdfs.fileType", "SequenceFile"); + context.put("hdfs.codeC", codec); + context.put("hdfs.writeFormat", "Writable"); + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.start(); + + for (String eventBody : eventBodies) { + Transaction txn = channel.getTransaction(); + txn.begin(); + + Event event = new SimpleEvent(); + event.setBody(eventBody.getBytes()); + channel.put(event); + + txn.commit(); + txn.close(); + + sink.process(); + } + + // Sink is _not_ closed. The file should remain open but + // the data written should be visible to readers via sync + hflush + FileStatus[] dirStat = fs.listStatus(dirPath); + Path[] paths = FileUtil.stat2Paths(dirStat); + + Assert.assertEquals(1, paths.length); + + SequenceFile.Reader reader = + new SequenceFile.Reader(fs.getConf(), SequenceFile.Reader.stream(fs.open(paths[0]))); + LongWritable key = new LongWritable(); + BytesWritable value = new BytesWritable(); + + for (String eventBody : eventBodies) { + Assert.assertTrue(reader.next(key, value)); + Assert.assertArrayEquals(eventBody.getBytes(), value.copyBytes()); + } + + Assert.assertFalse(reader.next(key, value)); + } + + private Context getContextForRetryTests() { + Context context = new Context(); + + context.put("hdfs.path", testPath + "/%{retryHeader}"); + context.put("hdfs.filePrefix", "test"); + context.put("hdfs.batchSize", String.valueOf(100)); + context.put("hdfs.fileType", "DataStream"); + context.put("hdfs.serializer", "text"); + context.put("hdfs.closeTries","3"); + context.put("hdfs.rollCount", "1"); + context.put("hdfs.retryInterval", "1"); + return context; + } + + @Test + public void testBadConfigurationForRetryIntervalZero() throws Exception { + Context context = getContextForRetryTests(); + context.put("hdfs.retryInterval", "0"); + + Configurables.configure(sink, context); + Assert.assertEquals(1, sink.getTryCount()); + } + + @Test + public void testBadConfigurationForRetryIntervalNegative() throws Exception { + Context context = getContextForRetryTests(); + context.put("hdfs.retryInterval", "-1"); + + Configurables.configure(sink, context); + Assert.assertEquals(1, sink.getTryCount()); + } + + @Test + public void testBadConfigurationForRetryCountZero() throws Exception { + Context context = getContextForRetryTests(); + context.put("hdfs.closeTries" ,"0"); + + Configurables.configure(sink, context); + Assert.assertEquals(Integer.MAX_VALUE, sink.getTryCount()); + } + + @Test + public void testBadConfigurationForRetryCountNegative() throws Exception { + Context context = getContextForRetryTests(); + context.put("hdfs.closeTries" ,"-4"); + + Configurables.configure(sink, context); + Assert.assertEquals(Integer.MAX_VALUE, sink.getTryCount()); + } + + @Test + public void testRetryRename() + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + testRetryRename(true); + testRetryRename(false); + } + + private void testRetryRename(boolean closeSucceed) + throws InterruptedException, LifecycleException, EventDeliveryException, IOException { + LOG.debug("Starting..."); + String newPath = testPath + "/retryBucket"; + + // clear the test directory + Configuration conf = new Configuration(); + FileSystem fs = FileSystem.get(conf); + Path dirPath = new Path(newPath); + fs.delete(dirPath, true); + fs.mkdirs(dirPath); + MockFileSystem mockFs = new MockFileSystem(fs, 6, closeSucceed); + + Context context = getContextForRetryTests(); + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + + sink.setChannel(channel); + sink.setMockFs(mockFs); + HDFSWriter hdfsWriter = new MockDataStream(mockFs); + hdfsWriter.configure(context); + sink.setMockWriter(hdfsWriter); + sink.start(); + + // push the event batches into channel + for (int i = 0; i < 2; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + Map hdr = Maps.newHashMap(); + hdr.put("retryHeader", "v1"); + + channel.put(EventBuilder.withBody("random".getBytes(), hdr)); + txn.commit(); + txn.close(); + + // execute sink to process the events + sink.process(); + } + // push the event batches into channel + for (int i = 0; i < 2; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + Map hdr = Maps.newHashMap(); + hdr.put("retryHeader", "v2"); + channel.put(EventBuilder.withBody("random".getBytes(), hdr)); + txn.commit(); + txn.close(); + // execute sink to process the events + sink.process(); + } + + TimeUnit.SECONDS.sleep(5); //Sleep till all retries are done. + + Collection writers = sink.getSfWriters().values(); + + int totalRenameAttempts = 0; + for (BucketWriter writer : writers) { + LOG.info("Rename tries = " + writer.renameTries.get()); + totalRenameAttempts += writer.renameTries.get(); + } + // stop clears the sfWriters map, so we need to compute the + // close tries count before stopping the sink. + sink.stop(); + Assert.assertEquals(6, totalRenameAttempts); + + } +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSinkOnMiniCluster.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSinkOnMiniCluster.java new file mode 100644 index 0000000..7c1caaa --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSinkOnMiniCluster.java @@ -0,0 +1,486 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hdfs; + +import com.google.common.base.Charsets; +import java.io.BufferedReader; +import java.io.File; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.zip.GZIPInputStream; +import org.apache.commons.io.FileUtils; +import org.apache.flume.Context; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.event.EventBuilder; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.BeforeClass; +import org.junit.Ignore; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Unit tests that exercise HDFSEventSink on an actual instance of HDFS. + * TODO: figure out how to unit-test Kerberos-secured HDFS. + */ +public class TestHDFSEventSinkOnMiniCluster { + + private static final Logger logger = + LoggerFactory.getLogger(TestHDFSEventSinkOnMiniCluster.class); + + private static final boolean KEEP_DATA = false; + private static final String DFS_DIR = "target/test/dfs"; + private static final String TEST_BUILD_DATA_KEY = "test.build.data"; + + private static MiniDFSCluster cluster = null; + private static String oldTestBuildDataProp = null; + + @BeforeClass + public static void setupClass() throws IOException { + // set up data dir for HDFS + File dfsDir = new File(DFS_DIR); + if (!dfsDir.isDirectory()) { + dfsDir.mkdirs(); + } + // save off system prop to restore later + oldTestBuildDataProp = System.getProperty(TEST_BUILD_DATA_KEY); + System.setProperty(TEST_BUILD_DATA_KEY, DFS_DIR); + } + + private static String getNameNodeURL(MiniDFSCluster cluster) { + int nnPort = cluster.getNameNode().getNameNodeAddress().getPort(); + return "hdfs://localhost:" + nnPort; + } + + /** + * This is a very basic test that writes one event to HDFS and reads it back. + */ + @Test + public void simpleHDFSTest() throws EventDeliveryException, IOException { + cluster = new MiniDFSCluster(new Configuration(), 1, true, null); + cluster.waitActive(); + + String outputDir = "/flume/simpleHDFSTest"; + Path outputDirPath = new Path(outputDir); + + logger.info("Running test with output dir: {}", outputDir); + + FileSystem fs = cluster.getFileSystem(); + // ensure output directory is empty + if (fs.exists(outputDirPath)) { + fs.delete(outputDirPath, true); + } + + String nnURL = getNameNodeURL(cluster); + logger.info("Namenode address: {}", nnURL); + + Context chanCtx = new Context(); + MemoryChannel channel = new MemoryChannel(); + channel.setName("simpleHDFSTest-mem-chan"); + channel.configure(chanCtx); + channel.start(); + + Context sinkCtx = new Context(); + sinkCtx.put("hdfs.path", nnURL + outputDir); + sinkCtx.put("hdfs.fileType", HDFSWriterFactory.DataStreamType); + sinkCtx.put("hdfs.batchSize", Integer.toString(1)); + + HDFSEventSink sink = new HDFSEventSink(); + sink.setName("simpleHDFSTest-hdfs-sink"); + sink.configure(sinkCtx); + sink.setChannel(channel); + sink.start(); + + // create an event + String EVENT_BODY = "yarg!"; + channel.getTransaction().begin(); + try { + channel.put(EventBuilder.withBody(EVENT_BODY, Charsets.UTF_8)); + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + + // store event to HDFS + sink.process(); + + // shut down flume + sink.stop(); + channel.stop(); + + // verify that it's in HDFS and that its content is what we say it should be + FileStatus[] statuses = fs.listStatus(outputDirPath); + Assert.assertNotNull("No files found written to HDFS", statuses); + Assert.assertEquals("Only one file expected", 1, statuses.length); + + for (FileStatus status : statuses) { + Path filePath = status.getPath(); + logger.info("Found file on DFS: {}", filePath); + FSDataInputStream stream = fs.open(filePath); + BufferedReader reader = new BufferedReader(new InputStreamReader(stream)); + String line = reader.readLine(); + logger.info("First line in file {}: {}", filePath, line); + Assert.assertEquals(EVENT_BODY, line); + } + + if (!KEEP_DATA) { + fs.delete(outputDirPath, true); + } + + cluster.shutdown(); + cluster = null; + } + + /** + * Writes two events in GZIP-compressed serialize. + */ + @Test + public void simpleHDFSGZipCompressedTest() throws EventDeliveryException, IOException { + cluster = new MiniDFSCluster(new Configuration(), 1, true, null); + cluster.waitActive(); + + String outputDir = "/flume/simpleHDFSGZipCompressedTest"; + Path outputDirPath = new Path(outputDir); + + logger.info("Running test with output dir: {}", outputDir); + + FileSystem fs = cluster.getFileSystem(); + // ensure output directory is empty + if (fs.exists(outputDirPath)) { + fs.delete(outputDirPath, true); + } + + String nnURL = getNameNodeURL(cluster); + logger.info("Namenode address: {}", nnURL); + + Context chanCtx = new Context(); + MemoryChannel channel = new MemoryChannel(); + channel.setName("simpleHDFSTest-mem-chan"); + channel.configure(chanCtx); + channel.start(); + + Context sinkCtx = new Context(); + sinkCtx.put("hdfs.path", nnURL + outputDir); + sinkCtx.put("hdfs.fileType", HDFSWriterFactory.CompStreamType); + sinkCtx.put("hdfs.batchSize", Integer.toString(1)); + sinkCtx.put("hdfs.codeC", "gzip"); + + HDFSEventSink sink = new HDFSEventSink(); + sink.setName("simpleHDFSTest-hdfs-sink"); + sink.configure(sinkCtx); + sink.setChannel(channel); + sink.start(); + + // create an event + String EVENT_BODY_1 = "yarg1"; + String EVENT_BODY_2 = "yarg2"; + channel.getTransaction().begin(); + try { + channel.put(EventBuilder.withBody(EVENT_BODY_1, Charsets.UTF_8)); + channel.put(EventBuilder.withBody(EVENT_BODY_2, Charsets.UTF_8)); + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + + // store event to HDFS + sink.process(); + + // shut down flume + sink.stop(); + channel.stop(); + + // verify that it's in HDFS and that its content is what we say it should be + FileStatus[] statuses = fs.listStatus(outputDirPath); + Assert.assertNotNull("No files found written to HDFS", statuses); + Assert.assertEquals("Only one file expected", 1, statuses.length); + + for (FileStatus status : statuses) { + Path filePath = status.getPath(); + logger.info("Found file on DFS: {}", filePath); + FSDataInputStream stream = fs.open(filePath); + BufferedReader reader = new BufferedReader(new InputStreamReader( + new GZIPInputStream(stream))); + String line = reader.readLine(); + logger.info("First line in file {}: {}", filePath, line); + Assert.assertEquals(EVENT_BODY_1, line); + + // The rest of this test is commented-out (will fail) for 2 reasons: + // + // (1) At the time of this writing, Hadoop has a bug which causes the + // non-native gzip implementation to create invalid gzip files when + // finish() and resetState() are called. See HADOOP-8522. + // + // (2) Even if HADOOP-8522 is fixed, the JDK GZipInputStream is unable + // to read multi-member (concatenated) gzip files. See this Sun bug: + // http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425 + // + //line = reader.readLine(); + //logger.info("Second line in file {}: {}", filePath, line); + //Assert.assertEquals(EVENT_BODY_2, line); + } + + if (!KEEP_DATA) { + fs.delete(outputDirPath, true); + } + + cluster.shutdown(); + cluster = null; + } + + /** + * This is a very basic test that writes one event to HDFS and reads it back. + */ + @Test + public void underReplicationTest() throws EventDeliveryException, + IOException { + Configuration conf = new Configuration(); + conf.set("dfs.replication", String.valueOf(3)); + cluster = new MiniDFSCluster(conf, 3, true, null); + cluster.waitActive(); + + String outputDir = "/flume/underReplicationTest"; + Path outputDirPath = new Path(outputDir); + + logger.info("Running test with output dir: {}", outputDir); + + FileSystem fs = cluster.getFileSystem(); + // ensure output directory is empty + if (fs.exists(outputDirPath)) { + fs.delete(outputDirPath, true); + } + + String nnURL = getNameNodeURL(cluster); + logger.info("Namenode address: {}", nnURL); + + Context chanCtx = new Context(); + MemoryChannel channel = new MemoryChannel(); + channel.setName("simpleHDFSTest-mem-chan"); + channel.configure(chanCtx); + channel.start(); + + Context sinkCtx = new Context(); + sinkCtx.put("hdfs.path", nnURL + outputDir); + sinkCtx.put("hdfs.fileType", HDFSWriterFactory.DataStreamType); + sinkCtx.put("hdfs.batchSize", Integer.toString(1)); + + HDFSEventSink sink = new HDFSEventSink(); + sink.setName("simpleHDFSTest-hdfs-sink"); + sink.configure(sinkCtx); + sink.setChannel(channel); + sink.start(); + + // create an event + channel.getTransaction().begin(); + try { + channel.put(EventBuilder.withBody("yarg 1", Charsets.UTF_8)); + channel.put(EventBuilder.withBody("yarg 2", Charsets.UTF_8)); + channel.put(EventBuilder.withBody("yarg 3", Charsets.UTF_8)); + channel.put(EventBuilder.withBody("yarg 4", Charsets.UTF_8)); + channel.put(EventBuilder.withBody("yarg 5", Charsets.UTF_8)); + channel.put(EventBuilder.withBody("yarg 5", Charsets.UTF_8)); + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + + // store events to HDFS + logger.info("Running process(). Create new file."); + sink.process(); // create new file; + logger.info("Running process(). Same file."); + sink.process(); + + // kill a datanode + logger.info("Killing datanode #1..."); + cluster.stopDataNode(0); + + // there is a race here.. the client may or may not notice that the + // datanode is dead before it next sync()s. + // so, this next call may or may not roll a new file. + + logger.info("Running process(). Create new file? (racy)"); + sink.process(); + + logger.info("Running process(). Create new file."); + sink.process(); + + logger.info("Running process(). Create new file."); + sink.process(); + + logger.info("Running process(). Create new file."); + sink.process(); + + // shut down flume + sink.stop(); + channel.stop(); + + // verify that it's in HDFS and that its content is what we say it should be + FileStatus[] statuses = fs.listStatus(outputDirPath); + Assert.assertNotNull("No files found written to HDFS", statuses); + + for (FileStatus status : statuses) { + Path filePath = status.getPath(); + logger.info("Found file on DFS: {}", filePath); + FSDataInputStream stream = fs.open(filePath); + BufferedReader reader = new BufferedReader(new InputStreamReader(stream)); + String line = reader.readLine(); + logger.info("First line in file {}: {}", filePath, line); + Assert.assertTrue(line.startsWith("yarg")); + } + + Assert.assertTrue("4 or 5 files expected, found " + statuses.length, + statuses.length == 4 || statuses.length == 5); + System.out.println("There are " + statuses.length + " files."); + + if (!KEEP_DATA) { + fs.delete(outputDirPath, true); + } + + cluster.shutdown(); + cluster = null; + } + + /** + * This is a very basic test that writes one event to HDFS and reads it back. + */ + @Ignore("This test is flakey and causes tests to fail pretty often.") + @Test + public void maxUnderReplicationTest() throws EventDeliveryException, + IOException { + Configuration conf = new Configuration(); + conf.set("dfs.replication", String.valueOf(3)); + cluster = new MiniDFSCluster(conf, 3, true, null); + cluster.waitActive(); + + String outputDir = "/flume/underReplicationTest"; + Path outputDirPath = new Path(outputDir); + + logger.info("Running test with output dir: {}", outputDir); + + FileSystem fs = cluster.getFileSystem(); + // ensure output directory is empty + if (fs.exists(outputDirPath)) { + fs.delete(outputDirPath, true); + } + + String nnURL = getNameNodeURL(cluster); + logger.info("Namenode address: {}", nnURL); + + Context chanCtx = new Context(); + MemoryChannel channel = new MemoryChannel(); + channel.setName("simpleHDFSTest-mem-chan"); + channel.configure(chanCtx); + channel.start(); + + Context sinkCtx = new Context(); + sinkCtx.put("hdfs.path", nnURL + outputDir); + sinkCtx.put("hdfs.fileType", HDFSWriterFactory.DataStreamType); + sinkCtx.put("hdfs.batchSize", Integer.toString(1)); + + HDFSEventSink sink = new HDFSEventSink(); + sink.setName("simpleHDFSTest-hdfs-sink"); + sink.configure(sinkCtx); + sink.setChannel(channel); + sink.start(); + + // create an event + channel.getTransaction().begin(); + try { + for (int i = 0; i < 50; i++) { + channel.put(EventBuilder.withBody("yarg " + i, Charsets.UTF_8)); + } + channel.getTransaction().commit(); + } finally { + channel.getTransaction().close(); + } + + // store events to HDFS + logger.info("Running process(). Create new file."); + sink.process(); // create new file; + logger.info("Running process(). Same file."); + sink.process(); + + // kill a datanode + logger.info("Killing datanode #1..."); + cluster.stopDataNode(0); + + // there is a race here.. the client may or may not notice that the + // datanode is dead before it next sync()s. + // so, this next call may or may not roll a new file. + + logger.info("Running process(). Create new file? (racy)"); + sink.process(); + + for (int i = 3; i < 50; i++) { + logger.info("Running process()."); + sink.process(); + } + + // shut down flume + sink.stop(); + channel.stop(); + + // verify that it's in HDFS and that its content is what we say it should be + FileStatus[] statuses = fs.listStatus(outputDirPath); + Assert.assertNotNull("No files found written to HDFS", statuses); + + for (FileStatus status : statuses) { + Path filePath = status.getPath(); + logger.info("Found file on DFS: {}", filePath); + FSDataInputStream stream = fs.open(filePath); + BufferedReader reader = new BufferedReader(new InputStreamReader(stream)); + String line = reader.readLine(); + logger.info("First line in file {}: {}", filePath, line); + Assert.assertTrue(line.startsWith("yarg")); + } + + System.out.println("There are " + statuses.length + " files."); + Assert.assertEquals("31 files expected, found " + statuses.length, + 31, statuses.length); + + if (!KEEP_DATA) { + fs.delete(outputDirPath, true); + } + + cluster.shutdown(); + cluster = null; + } + + @AfterClass + public static void teardownClass() { + // restore system state, if needed + if (oldTestBuildDataProp != null) { + System.setProperty(TEST_BUILD_DATA_KEY, oldTestBuildDataProp); + } + + if (!KEEP_DATA) { + FileUtils.deleteQuietly(new File(DFS_DIR)); + } + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSequenceFileSerializerFactory.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSequenceFileSerializerFactory.java new file mode 100644 index 0000000..974e857 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestSequenceFileSerializerFactory.java @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import org.apache.flume.Context; +import org.junit.Test; + +import static org.junit.Assert.assertTrue; + +public class TestSequenceFileSerializerFactory { + + @Test + public void getTextFormatter() { + SequenceFileSerializer formatter = + SequenceFileSerializerFactory.getSerializer("Text", new Context()); + + assertTrue(formatter != null); + assertTrue(formatter.getClass().getName(), + formatter instanceof HDFSTextSerializer); + } + + @Test + public void getWritableFormatter() { + SequenceFileSerializer formatter = + SequenceFileSerializerFactory.getSerializer("Writable", new Context()); + + assertTrue(formatter != null); + assertTrue(formatter.getClass().getName(), + formatter instanceof HDFSWritableSerializer); + } + + @Test + public void getCustomFormatter() { + SequenceFileSerializer formatter = SequenceFileSerializerFactory.getSerializer( + "org.apache.flume.sink.hdfs.MyCustomSerializer$Builder", new Context()); + + assertTrue(formatter != null); + assertTrue(formatter.getClass().getName(), + formatter instanceof MyCustomSerializer); + } + +} diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestUseRawLocalFileSystem.java b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestUseRawLocalFileSystem.java new file mode 100644 index 0000000..f3e7d10 --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestUseRawLocalFileSystem.java @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.hdfs; + +import java.io.File; +import org.apache.commons.io.FileUtils; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.EventBuilder; +import org.apache.hadoop.io.SequenceFile.CompressionType; +import org.apache.hadoop.io.compress.GzipCodec; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Charsets; +import com.google.common.io.Files; + +public class TestUseRawLocalFileSystem { + + private static Logger logger = + LoggerFactory.getLogger(TestUseRawLocalFileSystem.class); + private Context context; + + private File baseDir; + private File testFile; + private Event event; + + @Before + public void setup() throws Exception { + baseDir = Files.createTempDir(); + testFile = new File(baseDir.getAbsoluteFile(), "test"); + context = new Context(); + event = EventBuilder.withBody("test", Charsets.UTF_8); + } + + @After + public void teardown() throws Exception { + FileUtils.deleteQuietly(baseDir); + } + + @Test + public void testTestFile() throws Exception { + String file = testFile.getCanonicalPath(); + HDFSDataStream stream = new HDFSDataStream(); + context.put("hdfs.useRawLocalFileSystem", "true"); + stream.configure(context); + stream.open(file); + stream.append(event); + stream.sync(); + Assert.assertTrue(testFile.length() > 0); + } + @Test + public void testCompressedFile() throws Exception { + String file = testFile.getCanonicalPath(); + HDFSCompressedDataStream stream = new HDFSCompressedDataStream(); + context.put("hdfs.useRawLocalFileSystem", "true"); + stream.configure(context); + stream.open(file, new GzipCodec(), CompressionType.RECORD); + stream.append(event); + stream.sync(); + Assert.assertTrue(testFile.length() > 0); + } + @Test + public void testSequenceFile() throws Exception { + String file = testFile.getCanonicalPath(); + HDFSSequenceFile stream = new HDFSSequenceFile(); + context.put("hdfs.useRawLocalFileSystem", "true"); + stream.configure(context); + stream.open(file); + stream.append(event); + stream.sync(); + Assert.assertTrue(testFile.length() > 0); + } + +} \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-hdfs-sink/src/test/resources/log4j.properties b/code/flume-ng-sinks/flume-hdfs-sink/src/test/resources/log4j.properties new file mode 100644 index 0000000..252b5ea --- /dev/null +++ b/code/flume-ng-sinks/flume-hdfs-sink/src/test/resources/log4j.properties @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +log4j.rootLogger = INFO, out + +log4j.appender.out = org.apache.log4j.ConsoleAppender +log4j.appender.out.layout = org.apache.log4j.PatternLayout +log4j.appender.out.layout.ConversionPattern = %d (%t) [%p - %l] %m%n + +log4j.logger.org.apache.flume = DEBUG +log4j.logger.org.apache.hadoop = WARN +log4j.logger.org.mortbay = WARN diff --git a/code/flume-ng-sinks/flume-hive-sink/pom.xml b/code/flume-ng-sinks/flume-hive-sink/pom.xml new file mode 100644 index 0000000..6d9ee47 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/pom.xml @@ -0,0 +1,186 @@ + + + + + 4.0.0 + + + org.apache.flume + flume-ng-sinks + 1.7.0 + + + org.apache.flume.flume-ng-sinks + flume-hive-sink + Flume NG Hive Sink + + + + + org.apache.rat + apache-rat-plugin + + + + + + + hadoop-1.0 + + + flume.hadoop.profile + 1 + + + + + + org.apache.hadoop + hadoop-core + ${hadoop.version} + test + + + + + hadoop-2 + + + flume.hadoop.profile + 2 + + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + test + true + + + + org.apache.hadoop + hadoop-mapreduce-client-core + test + ${hadoop.version} + + + + + + hbase-1 + + + !flume.hadoop.profile + + + + + org.apache.hadoop + hadoop-common + test + true + + + + org.apache.hadoop + hadoop-mapreduce-client-core + test + ${hadoop.version} + + + + + + + + + org.apache.flume + flume-ng-sdk + + + + org.apache.flume + flume-ng-configuration + + + + org.apache.flume + flume-ng-core + + + + org.slf4j + slf4j-api + + + + junit + junit + test + + + + org.slf4j + slf4j-log4j12 + test + + + + org.apache.hive.hcatalog + hive-hcatalog-streaming + provided + + + + org.apache.hive.hcatalog + hive-hcatalog-core + provided + ${hive.version} + + + + org.apache.hive + hive-cli + test + + + + + xerces + xercesImpl + runtime + 2.9.1 + + + + xalan + serializer + + + + xalan + xalan + + + + + + diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/Config.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/Config.java new file mode 100644 index 0000000..b2d2582 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/Config.java @@ -0,0 +1,41 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +public class Config { + public static final String HIVE_METASTORE = "hive.metastore"; + public static final String HIVE_DATABASE = "hive.database"; + public static final String HIVE_TABLE = "hive.table"; + public static final String HIVE_PARTITION = "hive.partition"; + public static final String HIVE_TXNS_PER_BATCH_ASK = "hive.txnsPerBatchAsk"; + public static final String BATCH_SIZE = "batchSize"; + public static final String IDLE_TIMEOUT = "idleTimeout"; + public static final String CALL_TIMEOUT = "callTimeout"; + public static final String HEART_BEAT_INTERVAL = "heartBeatInterval"; + public static final String MAX_OPEN_CONNECTIONS = "maxOpenConnections"; + public static final String USE_LOCAL_TIME_STAMP = "useLocalTimeStamp"; + public static final String TIME_ZONE = "timeZone"; + public static final String ROUND_UNIT = "roundUnit"; + public static final String ROUND = "round"; + public static final String HOUR = "hour"; + public static final String MINUTE = "minute"; + public static final String SECOND = "second"; + public static final String ROUND_VALUE = "roundValue"; + public static final String SERIALIZER = "serializer"; +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java new file mode 100644 index 0000000..59520e7 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java @@ -0,0 +1,115 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hive.hcatalog.streaming.DelimitedInputWriter; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.apache.hive.hcatalog.streaming.RecordWriter; +import org.apache.hive.hcatalog.streaming.StreamingException; +import org.apache.hive.hcatalog.streaming.TransactionBatch; + +import java.io.IOException; +import java.util.Collection; + +/** Forwards the incoming event body to Hive unmodified + * Sets up the delimiter and the field to column mapping + */ +public class HiveDelimitedTextSerializer implements HiveEventSerializer { + public static final String ALIAS = "DELIMITED"; + + public static final String defaultDelimiter = ","; + public static final String SERIALIZER_DELIMITER = "serializer.delimiter"; + public static final String SERIALIZER_FIELDNAMES = "serializer.fieldnames"; + public static final String SERIALIZER_SERDE_SEPARATOR = "serializer.serdeSeparator"; + + private String delimiter; + private String[] fieldToColMapping = null; + private Character serdeSeparator = null; + + @Override + public void write(TransactionBatch txnBatch, Event e) + throws StreamingException, IOException, InterruptedException { + txnBatch.write(e.getBody()); + } + + @Override + public void write(TransactionBatch txnBatch, Collection events) + throws StreamingException, IOException, InterruptedException { + txnBatch.write(events); + } + + + @Override + public RecordWriter createRecordWriter(HiveEndPoint endPoint) + throws StreamingException, IOException, ClassNotFoundException { + if (serdeSeparator == null) { + return new DelimitedInputWriter(fieldToColMapping, delimiter, endPoint); + } + return new DelimitedInputWriter(fieldToColMapping, delimiter, endPoint, null, serdeSeparator); + } + + @Override + public void configure(Context context) { + delimiter = parseDelimiterSpec( + context.getString(SERIALIZER_DELIMITER, defaultDelimiter) ); + String fieldNames = context.getString(SERIALIZER_FIELDNAMES); + if (fieldNames == null) { + throw new IllegalArgumentException("serializer.fieldnames is not specified " + + "for serializer " + this.getClass().getName() ); + } + String serdeSeparatorStr = context.getString(SERIALIZER_SERDE_SEPARATOR); + this.serdeSeparator = parseSerdeSeparatorSpec(serdeSeparatorStr); + + // split, but preserve empty fields (-1) + fieldToColMapping = fieldNames.trim().split(",",-1); + } + + // if delimiter is a double quoted like "\t", drop quotes + private static String parseDelimiterSpec(String delimiter) { + if (delimiter == null) { + return null; + } + if (delimiter.charAt(0) == '"' && + delimiter.charAt(delimiter.length() - 1) == '"') { + return delimiter.substring(1,delimiter.length() - 1); + } + return delimiter; + } + + // if delimiter is a single quoted character like '\t', drop quotes + private static Character parseSerdeSeparatorSpec(String separatorStr) { + if (separatorStr == null) { + return null; + } + if (separatorStr.length() == 1) { + return separatorStr.charAt(0); + } + if (separatorStr.length() == 3 && + separatorStr.charAt(2) == '\'' && + separatorStr.charAt(separatorStr.length() - 1) == '\'') { + return separatorStr.charAt(1); + } + + throw new IllegalArgumentException("serializer.serdeSeparator spec is invalid " + + "for " + ALIAS + " serializer " ); + } + +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveEventSerializer.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveEventSerializer.java new file mode 100644 index 0000000..7ed2c82 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveEventSerializer.java @@ -0,0 +1,41 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +import org.apache.flume.Event; +import org.apache.flume.conf.Configurable; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.apache.hive.hcatalog.streaming.RecordWriter; +import org.apache.hive.hcatalog.streaming.StreamingException; +import org.apache.hive.hcatalog.streaming.TransactionBatch; + +import java.io.IOException; +import java.util.Collection; + +public interface HiveEventSerializer extends Configurable { + public void write(TransactionBatch batch, Event e) + throws StreamingException, IOException, InterruptedException; + + public void write(TransactionBatch txnBatch, Collection events) + throws StreamingException, IOException, InterruptedException; + + RecordWriter createRecordWriter(HiveEndPoint endPoint) + throws StreamingException, IOException, ClassNotFoundException; + +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveJsonSerializer.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveJsonSerializer.java new file mode 100644 index 0000000..0311a5b --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveJsonSerializer.java @@ -0,0 +1,62 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.apache.hive.hcatalog.streaming.RecordWriter; +import org.apache.hive.hcatalog.streaming.StreamingException; +import org.apache.hive.hcatalog.streaming.StrictJsonWriter; +import org.apache.hive.hcatalog.streaming.TransactionBatch; + +import java.io.IOException; +import java.util.Collection; + +/** Forwards the incoming event body to Hive unmodified + * Sets up the delimiter and the field to column mapping + */ + +public class HiveJsonSerializer implements HiveEventSerializer { + public static final String ALIAS = "JSON"; + + @Override + public void write(TransactionBatch txnBatch, Event e) + throws StreamingException, IOException, InterruptedException { + txnBatch.write(e.getBody()); + } + + @Override + public void write(TransactionBatch txnBatch, Collection events) + throws StreamingException, IOException, InterruptedException { + txnBatch.write(events); + } + + @Override + public RecordWriter createRecordWriter(HiveEndPoint endPoint) + throws StreamingException, IOException, ClassNotFoundException { + return new StrictJsonWriter(endPoint); + } + + @Override + public void configure(Context context) { + return; + } + +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java new file mode 100644 index 0000000..cc5cdca --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java @@ -0,0 +1,522 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.conf.Configurable; +import org.apache.flume.formatter.output.BucketPath; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.flume.sink.AbstractSink; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Calendar; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; +import java.util.TimeZone; +import java.util.Timer; +import java.util.TimerTask; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +public class HiveSink extends AbstractSink implements Configurable { + + private static final Logger LOG = LoggerFactory.getLogger(HiveSink.class); + + private static final int DEFAULT_MAXOPENCONNECTIONS = 500; + private static final int DEFAULT_TXNSPERBATCH = 100; + private static final int DEFAULT_BATCHSIZE = 15000; + private static final int DEFAULT_CALLTIMEOUT = 10000; + private static final int DEFAULT_IDLETIMEOUT = 0; + private static final int DEFAULT_HEARTBEATINTERVAL = 240; // seconds + + private Map allWriters; + + private SinkCounter sinkCounter; + private volatile int idleTimeout; + private String metaStoreUri; + private String proxyUser; + private String database; + private String table; + private List partitionVals; + private Integer txnsPerBatchAsk; + private Integer batchSize; + private Integer maxOpenConnections; + private boolean autoCreatePartitions; + private String serializerType; + private HiveEventSerializer serializer; + + /** + * Default timeout for blocking I/O calls in HiveWriter + */ + private Integer callTimeout; + private Integer heartBeatInterval; + + private ExecutorService callTimeoutPool; + + private boolean useLocalTime; + private TimeZone timeZone; + private boolean needRounding; + private int roundUnit; + private Integer roundValue; + + private Timer heartBeatTimer = new Timer(); + private AtomicBoolean timeToSendHeartBeat = new AtomicBoolean(false); + + @VisibleForTesting + Map getAllWriters() { + return allWriters; + } + + // read configuration and setup thresholds + @Override + public void configure(Context context) { + + metaStoreUri = context.getString(Config.HIVE_METASTORE); + if (metaStoreUri == null) { + throw new IllegalArgumentException(Config.HIVE_METASTORE + " config setting is not " + + "specified for sink " + getName()); + } + if (metaStoreUri.equalsIgnoreCase("null")) { // for testing support + metaStoreUri = null; + } + proxyUser = null; // context.getString("hive.proxyUser"); not supported by hive api yet + database = context.getString(Config.HIVE_DATABASE); + if (database == null) { + throw new IllegalArgumentException(Config.HIVE_DATABASE + " config setting is not " + + "specified for sink " + getName()); + } + table = context.getString(Config.HIVE_TABLE); + if (table == null) { + throw new IllegalArgumentException(Config.HIVE_TABLE + " config setting is not " + + "specified for sink " + getName()); + } + + String partitions = context.getString(Config.HIVE_PARTITION); + if (partitions != null) { + partitionVals = Arrays.asList(partitions.split(",")); + } + + + txnsPerBatchAsk = context.getInteger(Config.HIVE_TXNS_PER_BATCH_ASK, DEFAULT_TXNSPERBATCH); + if (txnsPerBatchAsk < 0) { + LOG.warn(getName() + ". hive.txnsPerBatchAsk must be positive number. Defaulting to " + + DEFAULT_TXNSPERBATCH); + txnsPerBatchAsk = DEFAULT_TXNSPERBATCH; + } + batchSize = context.getInteger(Config.BATCH_SIZE, DEFAULT_BATCHSIZE); + if (batchSize < 0) { + LOG.warn(getName() + ". batchSize must be positive number. Defaulting to " + + DEFAULT_BATCHSIZE); + batchSize = DEFAULT_BATCHSIZE; + } + idleTimeout = context.getInteger(Config.IDLE_TIMEOUT, DEFAULT_IDLETIMEOUT); + if (idleTimeout < 0) { + LOG.warn(getName() + ". idleTimeout must be positive number. Defaulting to " + + DEFAULT_IDLETIMEOUT); + idleTimeout = DEFAULT_IDLETIMEOUT; + } + callTimeout = context.getInteger(Config.CALL_TIMEOUT, DEFAULT_CALLTIMEOUT); + if (callTimeout < 0) { + LOG.warn(getName() + ". callTimeout must be positive number. Defaulting to " + + DEFAULT_CALLTIMEOUT); + callTimeout = DEFAULT_CALLTIMEOUT; + } + + heartBeatInterval = context.getInteger(Config.HEART_BEAT_INTERVAL, DEFAULT_HEARTBEATINTERVAL); + if (heartBeatInterval < 0) { + LOG.warn(getName() + ". heartBeatInterval must be positive number. Defaulting to " + + DEFAULT_HEARTBEATINTERVAL); + heartBeatInterval = DEFAULT_HEARTBEATINTERVAL; + } + maxOpenConnections = context.getInteger(Config.MAX_OPEN_CONNECTIONS, + DEFAULT_MAXOPENCONNECTIONS); + autoCreatePartitions = context.getBoolean("autoCreatePartitions", true); + + // Timestamp processing + useLocalTime = context.getBoolean(Config.USE_LOCAL_TIME_STAMP, false); + + String tzName = context.getString(Config.TIME_ZONE); + timeZone = (tzName == null) ? null : TimeZone.getTimeZone(tzName); + needRounding = context.getBoolean(Config.ROUND, false); + + String unit = context.getString(Config.ROUND_UNIT, Config.MINUTE); + if (unit.equalsIgnoreCase(Config.HOUR)) { + this.roundUnit = Calendar.HOUR_OF_DAY; + } else if (unit.equalsIgnoreCase(Config.MINUTE)) { + this.roundUnit = Calendar.MINUTE; + } else if (unit.equalsIgnoreCase(Config.SECOND)) { + this.roundUnit = Calendar.SECOND; + } else { + LOG.warn(getName() + ". Rounding unit is not valid, please set one of " + + "minute, hour or second. Rounding will be disabled"); + needRounding = false; + } + this.roundValue = context.getInteger(Config.ROUND_VALUE, 1); + if (roundUnit == Calendar.SECOND || roundUnit == Calendar.MINUTE) { + Preconditions.checkArgument(roundValue > 0 && roundValue <= 60, + "Round value must be > 0 and <= 60"); + } else if (roundUnit == Calendar.HOUR_OF_DAY) { + Preconditions.checkArgument(roundValue > 0 && roundValue <= 24, + "Round value must be > 0 and <= 24"); + } + + // Serializer + serializerType = context.getString(Config.SERIALIZER, ""); + if (serializerType.isEmpty()) { + throw new IllegalArgumentException("serializer config setting is not " + + "specified for sink " + getName()); + } + + serializer = createSerializer(serializerType); + serializer.configure(context); + + Preconditions.checkArgument(batchSize > 0, "batchSize must be greater than 0"); + + if (sinkCounter == null) { + sinkCounter = new SinkCounter(getName()); + } + } + + @VisibleForTesting + protected SinkCounter getCounter() { + return sinkCounter; + } + private HiveEventSerializer createSerializer(String serializerName) { + if (serializerName.compareToIgnoreCase(HiveDelimitedTextSerializer.ALIAS) == 0 || + serializerName.compareTo(HiveDelimitedTextSerializer.class.getName()) == 0) { + return new HiveDelimitedTextSerializer(); + } else if (serializerName.compareToIgnoreCase(HiveJsonSerializer.ALIAS) == 0 || + serializerName.compareTo(HiveJsonSerializer.class.getName()) == 0) { + return new HiveJsonSerializer(); + } + + try { + return (HiveEventSerializer) Class.forName(serializerName).newInstance(); + } catch (Exception e) { + throw new IllegalArgumentException("Unable to instantiate serializer: " + serializerName + + " on sink: " + getName(), e); + } + } + + + /** + * Pull events out of channel, find corresponding HiveWriter and write to it. + * Take at most batchSize events per Transaction.
+ * This method is not thread safe. + */ + public Status process() throws EventDeliveryException { + // writers used in this Txn + + Channel channel = getChannel(); + Transaction transaction = channel.getTransaction(); + transaction.begin(); + boolean success = false; + try { + // 1 Enable Heart Beats + if (timeToSendHeartBeat.compareAndSet(true, false)) { + enableHeartBeatOnAllWriters(); + } + + // 2 Drain Batch + int txnEventCount = drainOneBatch(channel); + transaction.commit(); + success = true; + + // 3 Update Counters + if (txnEventCount < 1) { + return Status.BACKOFF; + } else { + return Status.READY; + } + } catch (InterruptedException err) { + LOG.warn(getName() + ": Thread was interrupted.", err); + return Status.BACKOFF; + } catch (Exception e) { + throw new EventDeliveryException(e); + } finally { + if (!success) { + transaction.rollback(); + } + transaction.close(); + } + } + + // Drains one batch of events from Channel into Hive + private int drainOneBatch(Channel channel) + throws HiveWriter.Failure, InterruptedException { + int txnEventCount = 0; + try { + Map activeWriters = Maps.newHashMap(); + for (; txnEventCount < batchSize; ++txnEventCount) { + // 0) Read event from Channel + Event event = channel.take(); + if (event == null) { + break; + } + + //1) Create end point by substituting place holders + HiveEndPoint endPoint = makeEndPoint(metaStoreUri, database, table, + partitionVals, event.getHeaders(), timeZone, + needRounding, roundUnit, roundValue, useLocalTime); + + //2) Create or reuse Writer + HiveWriter writer = getOrCreateWriter(activeWriters, endPoint); + + //3) Write + LOG.debug("{} : Writing event to {}", getName(), endPoint); + writer.write(event); + + } // for + + //4) Update counters + if (txnEventCount == 0) { + sinkCounter.incrementBatchEmptyCount(); + } else if (txnEventCount == batchSize) { + sinkCounter.incrementBatchCompleteCount(); + } else { + sinkCounter.incrementBatchUnderflowCount(); + } + sinkCounter.addToEventDrainAttemptCount(txnEventCount); + + + // 5) Flush all Writers + for (HiveWriter writer : activeWriters.values()) { + writer.flush(true); + } + + sinkCounter.addToEventDrainSuccessCount(txnEventCount); + return txnEventCount; + } catch (HiveWriter.Failure e) { + // in case of error we close all TxnBatches to start clean next time + LOG.warn(getName() + " : " + e.getMessage(), e); + abortAllWriters(); + closeAllWriters(); + throw e; + } + } + + private void enableHeartBeatOnAllWriters() { + for (HiveWriter writer : allWriters.values()) { + writer.setHearbeatNeeded(); + } + } + + private HiveWriter getOrCreateWriter(Map activeWriters, + HiveEndPoint endPoint) + throws HiveWriter.ConnectException, InterruptedException { + try { + HiveWriter writer = allWriters.get( endPoint ); + if (writer == null) { + LOG.info(getName() + ": Creating Writer to Hive end point : " + endPoint); + writer = new HiveWriter(endPoint, txnsPerBatchAsk, autoCreatePartitions, + callTimeout, callTimeoutPool, proxyUser, serializer, sinkCounter); + + sinkCounter.incrementConnectionCreatedCount(); + if (allWriters.size() > maxOpenConnections) { + int retired = closeIdleWriters(); + if (retired == 0) { + closeEldestWriter(); + } + } + allWriters.put(endPoint, writer); + activeWriters.put(endPoint, writer); + } else { + if (activeWriters.get(endPoint) == null) { + activeWriters.put(endPoint,writer); + } + } + return writer; + } catch (HiveWriter.ConnectException e) { + sinkCounter.incrementConnectionFailedCount(); + throw e; + } + + } + + private HiveEndPoint makeEndPoint(String metaStoreUri, String database, String table, + List partVals, Map headers, + TimeZone timeZone, boolean needRounding, + int roundUnit, Integer roundValue, + boolean useLocalTime) { + if (partVals == null) { + return new HiveEndPoint(metaStoreUri, database, table, null); + } + + ArrayList realPartVals = Lists.newArrayList(); + for (String partVal : partVals) { + realPartVals.add(BucketPath.escapeString(partVal, headers, timeZone, + needRounding, roundUnit, roundValue, useLocalTime)); + } + return new HiveEndPoint(metaStoreUri, database, table, realPartVals); + } + + /** + * Locate writer that has not been used for longest time and retire it + */ + private void closeEldestWriter() throws InterruptedException { + long oldestTimeStamp = System.currentTimeMillis(); + HiveEndPoint eldest = null; + for (Entry entry : allWriters.entrySet()) { + if (entry.getValue().getLastUsed() < oldestTimeStamp) { + eldest = entry.getKey(); + oldestTimeStamp = entry.getValue().getLastUsed(); + } + } + + try { + sinkCounter.incrementConnectionCreatedCount(); + LOG.info(getName() + ": Closing least used Writer to Hive EndPoint : " + eldest); + allWriters.remove(eldest).close(); + } catch (InterruptedException e) { + LOG.warn(getName() + ": Interrupted when attempting to close writer for end point: " + + eldest, e); + throw e; + } + } + + /** + * Locate all writers past idle timeout and retire them + * @return number of writers retired + */ + private int closeIdleWriters() throws InterruptedException { + int count = 0; + long now = System.currentTimeMillis(); + ArrayList retirees = Lists.newArrayList(); + + //1) Find retirement candidates + for (Entry entry : allWriters.entrySet()) { + if (now - entry.getValue().getLastUsed() > idleTimeout) { + ++count; + retirees.add(entry.getKey()); + } + } + //2) Retire them + for (HiveEndPoint ep : retirees) { + sinkCounter.incrementConnectionClosedCount(); + LOG.info(getName() + ": Closing idle Writer to Hive end point : {}", ep); + allWriters.remove(ep).close(); + } + return count; + } + + /** + * Closes all writers and remove them from cache + * @return number of writers retired + */ + private void closeAllWriters() throws InterruptedException { + //1) Retire writers + for (Entry entry : allWriters.entrySet()) { + entry.getValue().close(); + } + + //2) Clear cache + allWriters.clear(); + } + + /** + * Abort current Txn on all writers + * @return number of writers retired + */ + private void abortAllWriters() throws InterruptedException { + for (Entry entry : allWriters.entrySet()) { + entry.getValue().abort(); + } + } + + @Override + public void stop() { + // do not constrain close() calls with a timeout + for (Entry entry : allWriters.entrySet()) { + try { + HiveWriter w = entry.getValue(); + w.close(); + } catch (InterruptedException ex) { + Thread.currentThread().interrupt(); + } + } + + // shut down all thread pools + callTimeoutPool.shutdown(); + try { + while (callTimeoutPool.isTerminated() == false) { + callTimeoutPool.awaitTermination( + Math.max(DEFAULT_CALLTIMEOUT, callTimeout), TimeUnit.MILLISECONDS); + } + } catch (InterruptedException ex) { + LOG.warn(getName() + ":Shutdown interrupted on " + callTimeoutPool, ex); + } + + callTimeoutPool = null; + allWriters.clear(); + allWriters = null; + sinkCounter.stop(); + super.stop(); + LOG.info("Hive Sink {} stopped", getName() ); + } + + @Override + public void start() { + String timeoutName = "hive-" + getName() + "-call-runner-%d"; + // call timeout pool needs only 1 thd as sink is effectively single threaded + callTimeoutPool = Executors.newFixedThreadPool(1, + new ThreadFactoryBuilder().setNameFormat(timeoutName).build()); + + this.allWriters = Maps.newHashMap(); + sinkCounter.start(); + super.start(); + setupHeartBeatTimer(); + LOG.info(getName() + ": Hive Sink {} started", getName() ); + } + + private void setupHeartBeatTimer() { + if (heartBeatInterval > 0) { + heartBeatTimer.schedule(new TimerTask() { + @Override + public void run() { + timeToSendHeartBeat.set(true); + setupHeartBeatTimer(); + } + }, heartBeatInterval * 1000); + } + } + + + @Override + public String toString() { + return "{ Sink type:" + getClass().getSimpleName() + ", name:" + getName() + + " }"; + } + +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java new file mode 100644 index 0000000..7106696 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java @@ -0,0 +1,513 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + *

+ * http://www.apache.org/licenses/LICENSE-2.0 + *

+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.hive; + +import org.apache.flume.Event; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.apache.hive.hcatalog.streaming.RecordWriter; +import org.apache.hive.hcatalog.streaming.SerializationError; +import org.apache.hive.hcatalog.streaming.StreamingConnection; +import org.apache.hive.hcatalog.streaming.StreamingException; +import org.apache.hive.hcatalog.streaming.StreamingIOFailure; +import org.apache.hive.hcatalog.streaming.TransactionBatch; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.concurrent.Callable; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; + +/** + * Internal API intended for HiveSink use. + */ +class HiveWriter { + + private static final Logger LOG = LoggerFactory.getLogger(HiveWriter.class); + + private final HiveEndPoint endPoint; + private HiveEventSerializer serializer; + private final StreamingConnection connection; + private final int txnsPerBatch; + private final RecordWriter recordWriter; + private TransactionBatch txnBatch; + + private final ExecutorService callTimeoutPool; + + private final long callTimeout; + + private long lastUsed; // time of last flush on this writer + + private SinkCounter sinkCounter; + private int batchCounter; + private long eventCounter; + private long processSize; + + protected boolean closed; // flag indicating HiveWriter was closed + private boolean autoCreatePartitions; + + private boolean hearbeatNeeded = false; + + private final int writeBatchSz = 1000; + private ArrayList batch = new ArrayList(writeBatchSz); + + HiveWriter(HiveEndPoint endPoint, int txnsPerBatch, + boolean autoCreatePartitions, long callTimeout, + ExecutorService callTimeoutPool, String hiveUser, + HiveEventSerializer serializer, SinkCounter sinkCounter) + throws ConnectException, InterruptedException { + try { + this.autoCreatePartitions = autoCreatePartitions; + this.sinkCounter = sinkCounter; + this.callTimeout = callTimeout; + this.callTimeoutPool = callTimeoutPool; + this.endPoint = endPoint; + this.connection = newConnection(hiveUser); + this.txnsPerBatch = txnsPerBatch; + this.serializer = serializer; + this.recordWriter = serializer.createRecordWriter(endPoint); + this.txnBatch = nextTxnBatch(recordWriter); + this.txnBatch.beginNextTransaction(); + this.closed = false; + this.lastUsed = System.currentTimeMillis(); + } catch (InterruptedException e) { + throw e; + } catch (RuntimeException e) { + throw e; + } catch (Exception e) { + throw new ConnectException(endPoint, e); + } + } + + @Override + public String toString() { + return endPoint.toString(); + } + + /** + * Clear the class counters + */ + private void resetCounters() { + eventCounter = 0; + processSize = 0; + batchCounter = 0; + } + + void setHearbeatNeeded() { + hearbeatNeeded = true; + } + + public int getRemainingTxns() { + return txnBatch.remainingTransactions(); + } + + + /** + * Write data, update stats + * @param event + * @throws WriteException - other streaming io error + * @throws InterruptedException + */ + public synchronized void write(final Event event) + throws WriteException, InterruptedException { + if (closed) { + throw new IllegalStateException("Writer closed. Cannot write to : " + endPoint); + } + + batch.add(event); + if (batch.size() == writeBatchSz) { + // write the event + writeEventBatchToSerializer(); + } + + // Update Statistics + processSize += event.getBody().length; + eventCounter++; + } + + private void writeEventBatchToSerializer() + throws InterruptedException, WriteException { + try { + timedCall(new CallRunner1() { + @Override + public Void call() throws InterruptedException, StreamingException { + try { + for (Event event : batch) { + try { + serializer.write(txnBatch, event); + } catch (SerializationError err) { + LOG.info("Parse failed : {} : {}", err.getMessage(), new String(event.getBody())); + } + } + return null; + } catch (IOException e) { + throw new StreamingIOFailure(e.getMessage(), e); + } + } + }); + batch.clear(); + } catch (StreamingException e) { + throw new WriteException(endPoint, txnBatch.getCurrentTxnId(), e); + } catch (TimeoutException e) { + throw new WriteException(endPoint, txnBatch.getCurrentTxnId(), e); + } + } + + /** + * Commits the current Txn. + * If 'rollToNext' is true, will switch to next Txn in batch or to a + * new TxnBatch if current Txn batch is exhausted + */ + public void flush(boolean rollToNext) + throws CommitException, TxnBatchException, TxnFailure, InterruptedException, + WriteException { + if (!batch.isEmpty()) { + writeEventBatchToSerializer(); + batch.clear(); + } + + //0 Heart beat on TxnBatch + if (hearbeatNeeded) { + hearbeatNeeded = false; + heartBeat(); + } + lastUsed = System.currentTimeMillis(); + + try { + //1 commit txn & close batch if needed + commitTxn(); + if (txnBatch.remainingTransactions() == 0) { + closeTxnBatch(); + txnBatch = null; + if (rollToNext) { + txnBatch = nextTxnBatch(recordWriter); + } + } + + //2 roll to next Txn + if (rollToNext) { + LOG.debug("Switching to next Txn for {}", endPoint); + txnBatch.beginNextTransaction(); // does not block + } + } catch (StreamingException e) { + throw new TxnFailure(txnBatch, e); + } + } + + /** + * Aborts the current Txn + * @throws InterruptedException + */ + public void abort() throws InterruptedException { + batch.clear(); + abortTxn(); + } + + /** Queues up a heartbeat request on the current and remaining txns using the + * heartbeatThdPool and returns immediately + */ + public void heartBeat() throws InterruptedException { + // 1) schedule the heartbeat on one thread in pool + try { + timedCall(new CallRunner1() { + @Override + public Void call() throws StreamingException { + LOG.info("Sending heartbeat on batch " + txnBatch); + txnBatch.heartbeat(); + return null; + } + }); + } catch (InterruptedException e) { + throw e; + } catch (Exception e) { + LOG.warn("Unable to send heartbeat on Txn Batch " + txnBatch, e); + // Suppressing exceptions as we don't care for errors on heartbeats + } + } + + /** + * Close the Transaction Batch and connection + * @throws IOException + * @throws InterruptedException + */ + public void close() throws InterruptedException { + batch.clear(); + abortRemainingTxns(); + closeTxnBatch(); + closeConnection(); + closed = true; + } + + + private void abortRemainingTxns() throws InterruptedException { + try { + if (!isClosed(txnBatch.getCurrentTransactionState())) { + abortCurrTxnHelper(); + } + + // recursively abort remaining txns + if (txnBatch.remainingTransactions() > 0) { + timedCall( + new CallRunner1() { + @Override + public Void call() throws StreamingException, InterruptedException { + txnBatch.beginNextTransaction(); + return null; + } + }); + abortRemainingTxns(); + } + } catch (StreamingException e) { + LOG.warn("Error when aborting remaining transactions in batch " + txnBatch, e); + return; + } catch (TimeoutException e) { + LOG.warn("Timed out when aborting remaining transactions in batch " + txnBatch, e); + return; + } + } + + private void abortCurrTxnHelper() throws TimeoutException, InterruptedException { + try { + timedCall( + new CallRunner1() { + @Override + public Void call() throws StreamingException, InterruptedException { + txnBatch.abort(); + LOG.info("Aborted txn " + txnBatch.getCurrentTxnId()); + return null; + } + } + ); + } catch (StreamingException e) { + LOG.warn("Unable to abort transaction " + txnBatch.getCurrentTxnId(), e); + // continue to attempt to abort other txns in the batch + } + } + + private boolean isClosed(TransactionBatch.TxnState txnState) { + if (txnState == TransactionBatch.TxnState.COMMITTED) { + return true; + } + if (txnState == TransactionBatch.TxnState.ABORTED) { + return true; + } + return false; + } + + public void closeConnection() throws InterruptedException { + LOG.info("Closing connection to EndPoint : {}", endPoint); + try { + timedCall(new CallRunner1() { + @Override + public Void call() { + connection.close(); // could block + return null; + } + }); + sinkCounter.incrementConnectionClosedCount(); + } catch (Exception e) { + LOG.warn("Error closing connection to EndPoint : " + endPoint, e); + // Suppressing exceptions as we don't care for errors on connection close + } + } + + private void commitTxn() throws CommitException, InterruptedException { + if (LOG.isInfoEnabled()) { + LOG.info("Committing Txn " + txnBatch.getCurrentTxnId() + " on EndPoint: " + endPoint); + } + try { + timedCall(new CallRunner1() { + @Override + public Void call() throws StreamingException, InterruptedException { + txnBatch.commit(); // could block + return null; + } + }); + } catch (Exception e) { + throw new CommitException(endPoint, txnBatch.getCurrentTxnId(), e); + } + } + + private void abortTxn() throws InterruptedException { + LOG.info("Aborting Txn id {} on End Point {}", txnBatch.getCurrentTxnId(), endPoint); + try { + timedCall(new CallRunner1() { + @Override + public Void call() throws StreamingException, InterruptedException { + txnBatch.abort(); // could block + return null; + } + }); + } catch (InterruptedException e) { + throw e; + } catch (TimeoutException e) { + LOG.warn("Timeout while aborting Txn " + txnBatch.getCurrentTxnId() + + " on EndPoint: " + endPoint, e); + } catch (Exception e) { + LOG.warn("Error aborting Txn " + txnBatch.getCurrentTxnId() + " on EndPoint: " + endPoint, e); + // Suppressing exceptions as we don't care for errors on abort + } + } + + private StreamingConnection newConnection(final String proxyUser) + throws InterruptedException, ConnectException { + try { + return timedCall(new CallRunner1() { + @Override + public StreamingConnection call() throws InterruptedException, StreamingException { + return endPoint.newConnection(autoCreatePartitions); // could block + } + }); + } catch (Exception e) { + throw new ConnectException(endPoint, e); + } + } + + private TransactionBatch nextTxnBatch(final RecordWriter recordWriter) + throws InterruptedException, TxnBatchException { + LOG.debug("Fetching new Txn Batch for {}", endPoint); + TransactionBatch batch = null; + try { + batch = timedCall(new CallRunner1() { + @Override + public TransactionBatch call() throws InterruptedException, StreamingException { + return connection.fetchTransactionBatch(txnsPerBatch, recordWriter); // could block + } + }); + LOG.info("Acquired Transaction batch {}", batch); + } catch (Exception e) { + throw new TxnBatchException(endPoint, e); + } + return batch; + } + + private void closeTxnBatch() throws InterruptedException { + try { + LOG.info("Closing Txn Batch {}.", txnBatch); + timedCall(new CallRunner1() { + @Override + public Void call() throws InterruptedException, StreamingException { + txnBatch.close(); // could block + return null; + } + }); + } catch (InterruptedException e) { + throw e; + } catch (Exception e) { + LOG.warn("Error closing Txn Batch " + txnBatch, e); + // Suppressing exceptions as we don't care for errors on batch close + } + } + + private T timedCall(final CallRunner1 callRunner) + throws TimeoutException, InterruptedException, StreamingException { + Future future = callTimeoutPool.submit(new Callable() { + @Override + public T call() throws StreamingException, InterruptedException, Failure { + return callRunner.call(); + } + }); + + try { + if (callTimeout > 0) { + return future.get(callTimeout, TimeUnit.MILLISECONDS); + } else { + return future.get(); + } + } catch (TimeoutException eT) { + future.cancel(true); + sinkCounter.incrementConnectionFailedCount(); + throw eT; + } catch (ExecutionException e1) { + sinkCounter.incrementConnectionFailedCount(); + Throwable cause = e1.getCause(); + if (cause instanceof IOException) { + throw new StreamingException("I/O Failure", (IOException) cause); + } else if (cause instanceof StreamingException) { + throw (StreamingException) cause; + } else if (cause instanceof TimeoutException) { + throw new StreamingException("Operation Timed Out.", (TimeoutException) cause); + } else if (cause instanceof RuntimeException) { + throw (RuntimeException) cause; + } else if (cause instanceof InterruptedException) { + throw (InterruptedException) cause; + } + throw new StreamingException(e1.getMessage(), e1); + } + } + + long getLastUsed() { + return lastUsed; + } + + /** + * Simple interface whose call method is called by + * {#callWithTimeout} in a new thread inside a + * {@linkplain java.security.PrivilegedExceptionAction#run()} call. + * @param + */ + private interface CallRunner { + T call() throws Exception; + } + + private interface CallRunner1 { + T call() throws StreamingException, InterruptedException, Failure; + } + + public static class Failure extends Exception { + public Failure(String msg, Throwable cause) { + super(msg, cause); + } + } + + public static class WriteException extends Failure { + public WriteException(HiveEndPoint endPoint, Long currentTxnId, Throwable cause) { + super("Failed writing to : " + endPoint + ". TxnID : " + currentTxnId, cause); + } + } + + public static class CommitException extends Failure { + public CommitException(HiveEndPoint endPoint, Long txnID, Throwable cause) { + super("Commit of Txn " + txnID + " failed on EndPoint: " + endPoint, cause); + } + } + + public static class ConnectException extends Failure { + public ConnectException(HiveEndPoint ep, Throwable cause) { + super("Failed connecting to EndPoint " + ep, cause); + } + } + + public static class TxnBatchException extends Failure { + public TxnBatchException(HiveEndPoint ep, Throwable cause) { + super("Failed acquiring Transaction Batch from EndPoint: " + ep, cause); + } + } + + private class TxnFailure extends Failure { + public TxnFailure(TransactionBatch txnBatch, Throwable cause) { + super("Failed switching to next Txn in TxnBatch " + txnBatch, cause); + } + } +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveSink.java b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveSink.java new file mode 100644 index 0000000..c417404 --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveSink.java @@ -0,0 +1,423 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + + +package org.apache.flume.sink.hive; + +import com.google.common.collect.Lists; +import junit.framework.Assert; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.conf.Configurables; +import org.apache.flume.event.SimpleEvent; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.hadoop.hive.cli.CliSessionState; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.txn.TxnDbUtil; +import org.apache.hadoop.hive.ql.CommandNeedRetryException; +import org.apache.hadoop.hive.ql.Driver; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.junit.After; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Calendar; +import java.util.List; +import java.util.UUID; + +public class TestHiveSink { + // 1) partitioned table + static final String dbName = "testing"; + static final String tblName = "alerts"; + + public static final String PART1_NAME = "continent"; + public static final String PART2_NAME = "country"; + public static final String[] partNames = { PART1_NAME, PART2_NAME }; + + private static final String COL1 = "id"; + private static final String COL2 = "msg"; + final String[] colNames = {COL1,COL2}; + private String[] colTypes = { "int", "string" }; + + private static final String PART1_VALUE = "Asia"; + private static final String PART2_VALUE = "India"; + private final ArrayList partitionVals; + + // 2) un-partitioned table + static final String dbName2 = "testing2"; + static final String tblName2 = "alerts2"; + final String[] colNames2 = {COL1,COL2}; + private String[] colTypes2 = { "int", "string" }; + + HiveSink sink = new HiveSink(); + + private final HiveConf conf; + + private final Driver driver; + + final String metaStoreURI; + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + private static final Logger LOG = LoggerFactory.getLogger(HiveSink.class); + + public TestHiveSink() throws Exception { + partitionVals = new ArrayList(2); + partitionVals.add(PART1_VALUE); + partitionVals.add(PART2_VALUE); + + metaStoreURI = "null"; + + conf = new HiveConf(this.getClass()); + TestUtil.setConfValues(conf); + + // 1) prepare hive + TxnDbUtil.cleanDb(); + TxnDbUtil.prepDb(); + + // 2) Setup Hive client + SessionState.start(new CliSessionState(conf)); + driver = new Driver(conf); + + } + + + @Before + public void setUp() throws Exception { + TestUtil.dropDB(conf, dbName); + + sink = new HiveSink(); + sink.setName("HiveSink-" + UUID.randomUUID().toString()); + + String dbLocation = dbFolder.newFolder(dbName).getCanonicalPath() + ".db"; + dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths + TestUtil.createDbAndTable(driver, dbName, tblName, partitionVals, colNames, + colTypes, partNames, dbLocation); + } + + @After + public void tearDown() throws MetaException, HiveException { + TestUtil.dropDB(conf, dbName); + } + + + @Test + public void testSingleWriterSimplePartitionedTable() + throws EventDeliveryException, IOException, CommandNeedRetryException { + int totalRecords = 4; + int batchSize = 2; + int batchCount = totalRecords / batchSize; + + Context context = new Context(); + context.put("hive.metastore", metaStoreURI); + context.put("hive.database",dbName); + context.put("hive.table",tblName); + context.put("hive.partition", PART1_VALUE + "," + PART2_VALUE); + context.put("autoCreatePartitions","false"); + context.put("batchSize","" + batchSize); + context.put("serializer", HiveDelimitedTextSerializer.ALIAS); + context.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + context.put("heartBeatInterval", "0"); + + Channel channel = startSink(sink, context); + + List bodies = Lists.newArrayList(); + + // push the events in two batches + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= totalRecords; j++) { + Event event = new SimpleEvent(); + String body = j + ",blah,This is a log message,other stuff"; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + } + // execute sink to process the events + txn.commit(); + txn.close(); + + + checkRecordCountInTable(0, dbName, tblName); + for (int i = 0; i < batchCount ; i++) { + sink.process(); + } + sink.stop(); + checkRecordCountInTable(totalRecords, dbName, tblName); + } + + @Test + public void testSingleWriterSimpleUnPartitionedTable() + throws Exception { + TestUtil.dropDB(conf, dbName2); + String dbLocation = dbFolder.newFolder(dbName2).getCanonicalPath() + ".db"; + dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths + TestUtil.createDbAndTable(driver, dbName2, tblName2, null, colNames2, colTypes2, + null, dbLocation); + + try { + int totalRecords = 4; + int batchSize = 2; + int batchCount = totalRecords / batchSize; + + Context context = new Context(); + context.put("hive.metastore", metaStoreURI); + context.put("hive.database", dbName2); + context.put("hive.table", tblName2); + context.put("autoCreatePartitions","false"); + context.put("batchSize","" + batchSize); + context.put("serializer", HiveDelimitedTextSerializer.ALIAS); + context.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + context.put("heartBeatInterval", "0"); + + Channel channel = startSink(sink, context); + + List bodies = Lists.newArrayList(); + + // Push the events in two batches + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= totalRecords; j++) { + Event event = new SimpleEvent(); + String body = j + ",blah,This is a log message,other stuff"; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + } + + txn.commit(); + txn.close(); + + checkRecordCountInTable(0, dbName2, tblName2); + for (int i = 0; i < batchCount ; i++) { + sink.process(); + } + + // check before & after stopping sink + checkRecordCountInTable(totalRecords, dbName2, tblName2); + sink.stop(); + checkRecordCountInTable(totalRecords, dbName2, tblName2); + } finally { + TestUtil.dropDB(conf, dbName2); + } + } + + @Test + public void testSingleWriterUseHeaders() + throws Exception { + String[] colNames = {COL1, COL2}; + String PART1_NAME = "country"; + String PART2_NAME = "hour"; + String[] partNames = {PART1_NAME, PART2_NAME}; + List partitionVals = null; + String PART1_VALUE = "%{" + PART1_NAME + "}"; + String PART2_VALUE = "%y-%m-%d-%k"; + partitionVals = new ArrayList(2); + partitionVals.add(PART1_VALUE); + partitionVals.add(PART2_VALUE); + + String tblName = "hourlydata"; + TestUtil.dropDB(conf, dbName2); + String dbLocation = dbFolder.newFolder(dbName2).getCanonicalPath() + ".db"; + dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths + TestUtil.createDbAndTable(driver, dbName2, tblName, partitionVals, colNames, + colTypes, partNames, dbLocation); + + int totalRecords = 4; + int batchSize = 2; + int batchCount = totalRecords / batchSize; + + Context context = new Context(); + context.put("hive.metastore",metaStoreURI); + context.put("hive.database",dbName2); + context.put("hive.table",tblName); + context.put("hive.partition", PART1_VALUE + "," + PART2_VALUE); + context.put("autoCreatePartitions","true"); + context.put("useLocalTimeStamp", "false"); + context.put("batchSize","" + batchSize); + context.put("serializer", HiveDelimitedTextSerializer.ALIAS); + context.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + context.put("heartBeatInterval", "0"); + + Channel channel = startSink(sink, context); + + Calendar eventDate = Calendar.getInstance(); + List bodies = Lists.newArrayList(); + + // push events in two batches - two per batch. each batch is diff hour + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= totalRecords; j++) { + Event event = new SimpleEvent(); + String body = j + ",blah,This is a log message,other stuff"; + event.setBody(body.getBytes()); + eventDate.clear(); + eventDate.set(2014, 03, 03, j % batchCount, 1); // yy mm dd hh mm + event.getHeaders().put( "timestamp", + String.valueOf(eventDate.getTimeInMillis()) ); + event.getHeaders().put( PART1_NAME, "Asia" ); + bodies.add(body); + channel.put(event); + } + // execute sink to process the events + txn.commit(); + txn.close(); + + checkRecordCountInTable(0, dbName2, tblName); + for (int i = 0; i < batchCount ; i++) { + sink.process(); + } + checkRecordCountInTable(totalRecords, dbName2, tblName); + sink.stop(); + + // verify counters + SinkCounter counter = sink.getCounter(); + Assert.assertEquals(2, counter.getConnectionCreatedCount()); + Assert.assertEquals(2, counter.getConnectionClosedCount()); + Assert.assertEquals(2, counter.getBatchCompleteCount()); + Assert.assertEquals(0, counter.getBatchEmptyCount()); + Assert.assertEquals(0, counter.getConnectionFailedCount() ); + Assert.assertEquals(4, counter.getEventDrainAttemptCount()); + Assert.assertEquals(4, counter.getEventDrainSuccessCount() ); + + } + + @Test + public void testHeartBeat() + throws EventDeliveryException, IOException, CommandNeedRetryException { + int batchSize = 2; + int batchCount = 3; + int totalRecords = batchCount * batchSize; + Context context = new Context(); + context.put("hive.metastore", metaStoreURI); + context.put("hive.database", dbName); + context.put("hive.table", tblName); + context.put("hive.partition", PART1_VALUE + "," + PART2_VALUE); + context.put("autoCreatePartitions","true"); + context.put("batchSize","" + batchSize); + context.put("serializer", HiveDelimitedTextSerializer.ALIAS); + context.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + context.put("hive.txnsPerBatchAsk", "20"); + context.put("heartBeatInterval", "3"); // heartbeat in seconds + + Channel channel = startSink(sink, context); + + List bodies = Lists.newArrayList(); + + // push the events in two batches + for (int i = 0; i < batchCount; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + String body = i * j + ",blah,This is a log message,other stuff"; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + } + // execute sink to process the events + txn.commit(); + txn.close(); + + sink.process(); + sleep(3000); // allow heartbeat to happen + } + + sink.stop(); + checkRecordCountInTable(totalRecords, dbName, tblName); + } + + @Test + public void testJsonSerializer() throws Exception { + int batchSize = 2; + int batchCount = 2; + int totalRecords = batchCount * batchSize; + Context context = new Context(); + context.put("hive.metastore",metaStoreURI); + context.put("hive.database",dbName); + context.put("hive.table",tblName); + context.put("hive.partition", PART1_VALUE + "," + PART2_VALUE); + context.put("autoCreatePartitions","true"); + context.put("batchSize","" + batchSize); + context.put("serializer", HiveJsonSerializer.ALIAS); + context.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + context.put("heartBeatInterval", "0"); + + Channel channel = startSink(sink, context); + + List bodies = Lists.newArrayList(); + + // push the events in two batches + for (int i = 0; i < batchCount; i++) { + Transaction txn = channel.getTransaction(); + txn.begin(); + for (int j = 1; j <= batchSize; j++) { + Event event = new SimpleEvent(); + String body = "{\"id\" : 1, \"msg\" : \"using json serializer\"}"; + event.setBody(body.getBytes()); + bodies.add(body); + channel.put(event); + } + // execute sink to process the events + txn.commit(); + txn.close(); + + sink.process(); + } + checkRecordCountInTable(totalRecords, dbName, tblName); + sink.stop(); + checkRecordCountInTable(totalRecords, dbName, tblName); + } + + private void sleep(int n) { + try { + Thread.sleep(n); + } catch (InterruptedException e) { + } + } + + private static Channel startSink(HiveSink sink, Context context) { + Configurables.configure(sink, context); + + Channel channel = new MemoryChannel(); + Configurables.configure(channel, context); + sink.setChannel(channel); + sink.start(); + return channel; + } + + private void checkRecordCountInTable(int expectedCount, String db, String tbl) + throws CommandNeedRetryException, IOException { + int count = TestUtil.listRecordsInTable(driver, db, tbl).size(); + Assert.assertEquals(expectedCount, count); + } +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveWriter.java b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveWriter.java new file mode 100644 index 0000000..4d7c9bb --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveWriter.java @@ -0,0 +1,351 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flume.sink.hive; + +import com.google.common.util.concurrent.ThreadFactoryBuilder; +import junit.framework.Assert; +import org.apache.flume.Context; +import org.apache.flume.event.SimpleEvent; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.hadoop.hive.cli.CliSessionState; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.txn.TxnDbUtil; +import org.apache.hadoop.hive.ql.CommandNeedRetryException; +import org.apache.hadoop.hive.ql.Driver; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.apache.hive.hcatalog.streaming.HiveEndPoint; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; + +public class TestHiveWriter { + static final String dbName = "testing"; + static final String tblName = "alerts"; + + public static final String PART1_NAME = "continent"; + public static final String PART2_NAME = "country"; + public static final String[] partNames = { PART1_NAME, PART2_NAME }; + + private static final String COL1 = "id"; + private static final String COL2 = "msg"; + final String[] colNames = {COL1,COL2}; + private String[] colTypes = { "int", "string" }; + + private static final String PART1_VALUE = "Asia"; + private static final String PART2_VALUE = "India"; + private final ArrayList partVals; + + private final String metaStoreURI; + + private HiveDelimitedTextSerializer serializer; + + private final HiveConf conf; + + private ExecutorService callTimeoutPool; + int timeout = 10000; // msec + + @Rule + public TemporaryFolder dbFolder = new TemporaryFolder(); + + private final Driver driver; + + public TestHiveWriter() throws Exception { + partVals = new ArrayList(2); + partVals.add(PART1_VALUE); + partVals.add(PART2_VALUE); + + metaStoreURI = null; + + int callTimeoutPoolSize = 1; + callTimeoutPool = Executors.newFixedThreadPool(callTimeoutPoolSize, + new ThreadFactoryBuilder().setNameFormat("hiveWriterTest").build()); + + // 1) Start metastore + conf = new HiveConf(this.getClass()); + TestUtil.setConfValues(conf); + if (metaStoreURI != null) { + conf.setVar(HiveConf.ConfVars.METASTOREURIS, metaStoreURI); + } + + // 2) Setup Hive client + SessionState.start(new CliSessionState(conf)); + driver = new Driver(conf); + + } + + @Before + public void setUp() throws Exception { + // 1) prepare hive + TxnDbUtil.cleanDb(); + TxnDbUtil.prepDb(); + + // 1) Setup tables + TestUtil.dropDB(conf, dbName); + String dbLocation = dbFolder.newFolder(dbName).getCanonicalPath() + ".db"; + dbLocation = dbLocation.replaceAll("\\\\","/"); // for windows paths + TestUtil.createDbAndTable(driver, dbName, tblName, partVals, colNames, colTypes, partNames, + dbLocation); + + // 2) Setup serializer + Context ctx = new Context(); + ctx.put("serializer.fieldnames", COL1 + ",," + COL2 + ","); + serializer = new HiveDelimitedTextSerializer(); + serializer.configure(ctx); + } + + @Test + public void testInstantiate() throws Exception { + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + HiveWriter writer = new HiveWriter(endPoint, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter); + + writer.close(); + } + + @Test + public void testWriteBasic() throws Exception { + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + HiveWriter writer = new HiveWriter(endPoint, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter); + + writeEvents(writer,3); + writer.flush(false); + writer.close(); + checkRecordCountInTable(3); + } + + @Test + public void testWriteMultiFlush() throws Exception { + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + + HiveWriter writer = new HiveWriter(endPoint, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter); + + checkRecordCountInTable(0); + SimpleEvent event = new SimpleEvent(); + + String REC1 = "1,xyz,Hello world,abc"; + event.setBody(REC1.getBytes()); + writer.write(event); + checkRecordCountInTable(0); + writer.flush(true); + checkRecordCountInTable(1); + + String REC2 = "2,xyz,Hello world,abc"; + event.setBody(REC2.getBytes()); + writer.write(event); + checkRecordCountInTable(1); + writer.flush(true); + checkRecordCountInTable(2); + + String REC3 = "3,xyz,Hello world,abc"; + event.setBody(REC3.getBytes()); + writer.write(event); + writer.flush(true); + checkRecordCountInTable(3); + writer.close(); + + checkRecordCountInTable(3); + } + + @Test + public void testTxnBatchConsumption() throws Exception { + // get a small txn batch and consume it, then roll to new batch, very + // the number of remaining txns to ensure Txns are not accidentally skipped + + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + + int txnPerBatch = 3; + + HiveWriter writer = new HiveWriter(endPoint, txnPerBatch, true, timeout, callTimeoutPool, + "flumetest", serializer, sinkCounter); + + Assert.assertEquals(writer.getRemainingTxns(),2); + writer.flush(true); + + Assert.assertEquals(writer.getRemainingTxns(), 1); + writer.flush(true); + + Assert.assertEquals(writer.getRemainingTxns(), 0); + writer.flush(true); + + // flip over to next batch + Assert.assertEquals(writer.getRemainingTxns(), 2); + writer.flush(true); + + Assert.assertEquals(writer.getRemainingTxns(), 1); + + writer.close(); + + } + + private void checkRecordCountInTable(int expectedCount) + throws CommandNeedRetryException, IOException { + int count = TestUtil.listRecordsInTable(driver, dbName, tblName).size(); + Assert.assertEquals(expectedCount, count); + } + + /** + * Sets up input fields to have same order as table columns, + * Also sets the separator on serde to be same as i/p field separator + * @throws Exception + */ + @Test + public void testInOrderWrite() throws Exception { + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + int timeout = 5000; // msec + + HiveDelimitedTextSerializer serializer2 = new HiveDelimitedTextSerializer(); + Context ctx = new Context(); + ctx.put("serializer.fieldnames", COL1 + "," + COL2); + ctx.put("serializer.serdeSeparator", ","); + serializer2.configure(ctx); + + + HiveWriter writer = new HiveWriter(endPoint, 10, true, timeout, callTimeoutPool, + "flumetest", serializer2, sinkCounter); + + SimpleEvent event = new SimpleEvent(); + event.setBody("1,Hello world 1".getBytes()); + writer.write(event); + event.setBody("2,Hello world 2".getBytes()); + writer.write(event); + event.setBody("3,Hello world 3".getBytes()); + writer.write(event); + writer.flush(false); + writer.close(); + } + + @Test + public void testSerdeSeparatorCharParsing() throws Exception { + HiveEndPoint endPoint = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + SinkCounter sinkCounter = new SinkCounter(this.getClass().getName()); + int timeout = 10000; // msec + + // 1) single character serdeSeparator + HiveDelimitedTextSerializer serializer1 = new HiveDelimitedTextSerializer(); + Context ctx = new Context(); + ctx.put("serializer.fieldnames", COL1 + "," + COL2); + ctx.put("serializer.serdeSeparator", ","); + serializer1.configure(ctx); + // show not throw + + + // 2) special character as serdeSeparator + HiveDelimitedTextSerializer serializer2 = new HiveDelimitedTextSerializer(); + ctx = new Context(); + ctx.put("serializer.fieldnames", COL1 + "," + COL2); + ctx.put("serializer.serdeSeparator", "'\t'"); + serializer2.configure(ctx); + // show not throw + + + // 2) bad spec as serdeSeparator + HiveDelimitedTextSerializer serializer3 = new HiveDelimitedTextSerializer(); + ctx = new Context(); + ctx.put("serializer.fieldnames", COL1 + "," + COL2); + ctx.put("serializer.serdeSeparator", "ab"); + try { + serializer3.configure(ctx); + Assert.assertTrue("Bad serdeSeparator character was accepted", false); + } catch (Exception e) { + // expect an exception + } + + } + + @Test + public void testSecondWriterBeforeFirstCommits() throws Exception { + // here we open a new writer while the first is still writing (not committed) + HiveEndPoint endPoint1 = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + ArrayList partVals2 = new ArrayList(2); + partVals2.add(PART1_VALUE); + partVals2.add("Nepal"); + HiveEndPoint endPoint2 = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals2); + + SinkCounter sinkCounter1 = new SinkCounter(this.getClass().getName()); + SinkCounter sinkCounter2 = new SinkCounter(this.getClass().getName()); + + HiveWriter writer1 = new HiveWriter(endPoint1, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter1); + + writeEvents(writer1, 3); + + HiveWriter writer2 = new HiveWriter(endPoint2, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter2); + writeEvents(writer2, 3); + writer2.flush(false); // commit + + writer1.flush(false); // commit + writer1.close(); + + writer2.close(); + } + + @Test + public void testSecondWriterAfterFirstCommits() throws Exception { + // here we open a new writer after the first writer has committed one txn + HiveEndPoint endPoint1 = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals); + ArrayList partVals2 = new ArrayList(2); + partVals2.add(PART1_VALUE); + partVals2.add("Nepal"); + HiveEndPoint endPoint2 = new HiveEndPoint(metaStoreURI, dbName, tblName, partVals2); + + SinkCounter sinkCounter1 = new SinkCounter(this.getClass().getName()); + SinkCounter sinkCounter2 = new SinkCounter(this.getClass().getName()); + + HiveWriter writer1 = new HiveWriter(endPoint1, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter1); + + writeEvents(writer1, 3); + + writer1.flush(false); // commit + + + HiveWriter writer2 = new HiveWriter(endPoint2, 10, true, timeout, callTimeoutPool, "flumetest", + serializer, sinkCounter2); + writeEvents(writer2, 3); + writer2.flush(false); // commit + + + writer1.close(); + writer2.close(); + } + + private void writeEvents(HiveWriter writer, int count) + throws InterruptedException, HiveWriter.WriteException { + SimpleEvent event = new SimpleEvent(); + for (int i = 1; i <= count; i++) { + event.setBody((i + ",xyz,Hello world,abc").getBytes()); + writer.write(event); + } + } +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java new file mode 100644 index 0000000..1fcb4eb --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + + +package org.apache.flume.sink.hive; + +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RawLocalFileSystem; +import org.apache.hadoop.fs.permission.FsPermission; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.HiveMetaStoreClient; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.ql.CommandNeedRetryException; +import org.apache.hadoop.hive.ql.Driver; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.shims.ShimLoader; +import org.apache.hadoop.util.Shell; +import org.apache.hive.hcatalog.streaming.QueryFailedException; +import org.apache.thrift.TException; + +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.util.ArrayList; +import java.util.List; + +public class TestUtil { + + private static final String txnMgr = "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager"; + + /** + * Set up the configuration so it will use the DbTxnManager, concurrency will be set to true, + * and the JDBC configs will be set for putting the transaction and lock info in the embedded + * metastore. + * @param conf HiveConf to add these values to. + */ + public static void setConfValues(HiveConf conf) { + conf.setVar(HiveConf.ConfVars.HIVE_TXN_MANAGER, txnMgr); + conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, true); + conf.set("fs.raw.impl", RawFileSystem.class.getName()); + } + + public static void createDbAndTable(Driver driver, String databaseName, + String tableName, List partVals, + String[] colNames, String[] colTypes, + String[] partNames, String dbLocation) + throws Exception { + String dbUri = "raw://" + dbLocation; + String tableLoc = dbUri + Path.SEPARATOR + tableName; + + runDDL(driver, "create database IF NOT EXISTS " + databaseName + " location '" + dbUri + "'"); + runDDL(driver, "use " + databaseName); + String crtTbl = "create table " + tableName + + " ( " + getTableColumnsStr(colNames,colTypes) + " )" + + getPartitionStmtStr(partNames) + + " clustered by ( " + colNames[0] + " )" + + " into 10 buckets " + + " stored as orc " + + " location '" + tableLoc + "'" + + " TBLPROPERTIES ('transactional'='true')"; + + runDDL(driver, crtTbl); + System.out.println("crtTbl = " + crtTbl); + if (partNames != null && partNames.length != 0) { + String addPart = "alter table " + tableName + " add partition ( " + + getTablePartsStr2(partNames, partVals) + " )"; + runDDL(driver, addPart); + } + } + + private static String getPartitionStmtStr(String[] partNames) { + if ( partNames == null || partNames.length == 0) { + return ""; + } + return " partitioned by (" + getTablePartsStr(partNames) + " )"; + } + + // delete db and all tables in it + public static void dropDB(HiveConf conf, String databaseName) + throws HiveException, MetaException { + IMetaStoreClient client = new HiveMetaStoreClient(conf); + try { + for (String table : client.listTableNamesByFilter(databaseName, "", (short)-1)) { + client.dropTable(databaseName, table, true, true); + } + client.dropDatabase(databaseName); + } catch (TException e) { + client.close(); + } + } + + private static String getTableColumnsStr(String[] colNames, String[] colTypes) { + StringBuffer sb = new StringBuffer(); + for (int i = 0; i < colNames.length; ++i) { + sb.append(colNames[i] + " " + colTypes[i]); + if (i < colNames.length - 1) { + sb.append(","); + } + } + return sb.toString(); + } + + // converts partNames into "partName1 string, partName2 string" + private static String getTablePartsStr(String[] partNames) { + if (partNames == null || partNames.length == 0) { + return ""; + } + StringBuffer sb = new StringBuffer(); + for (int i = 0; i < partNames.length; ++i) { + sb.append(partNames[i] + " string"); + if (i < partNames.length - 1) { + sb.append(","); + } + } + return sb.toString(); + } + + // converts partNames,partVals into "partName1=val1, partName2=val2" + private static String getTablePartsStr2(String[] partNames, List partVals) { + StringBuffer sb = new StringBuffer(); + for (int i = 0; i < partVals.size(); ++i) { + sb.append(partNames[i] + " = '" + partVals.get(i) + "'"); + if (i < partVals.size() - 1) { + sb.append(","); + } + } + return sb.toString(); + } + + public static ArrayList listRecordsInTable(Driver driver, String dbName, String tblName) + throws CommandNeedRetryException, IOException { + driver.run("select * from " + dbName + "." + tblName); + ArrayList res = new ArrayList(); + driver.getResults(res); + return res; + } + + public static ArrayList listRecordsInPartition(Driver driver, String dbName, + String tblName, String continent, + String country) + throws CommandNeedRetryException, IOException { + driver.run("select * from " + dbName + "." + tblName + " where continent='" + + continent + "' and country='" + country + "'"); + ArrayList res = new ArrayList(); + driver.getResults(res); + return res; + } + + public static class RawFileSystem extends RawLocalFileSystem { + private static final URI NAME; + + static { + try { + NAME = new URI("raw:///"); + } catch (URISyntaxException se) { + throw new IllegalArgumentException("bad uri", se); + } + } + + @Override + public URI getUri() { + return NAME; + } + + static String execCommand(File f, String... cmd) throws IOException { + String[] args = new String[cmd.length + 1]; + System.arraycopy(cmd, 0, args, 0, cmd.length); + args[cmd.length] = f.getCanonicalPath(); + String output = Shell.execCommand(args); + return output; + } + + @Override + public FileStatus getFileStatus(Path path) throws IOException { + File file = pathToFile(path); + if (!file.exists()) { + throw new FileNotFoundException("Can't find " + path); + } + // get close enough + short mod = 0; + if (file.canRead()) { + mod |= 0444; + } + if (file.canWrite()) { + mod |= 0200; + } + if (file.canExecute()) { + mod |= 0111; + } + ShimLoader.getHadoopShims(); + return new FileStatus(file.length(), file.isDirectory(), 1, 1024, + file.lastModified(), file.lastModified(), + FsPermission.createImmutable(mod), "owen", "users", path); + } + } + + private static boolean runDDL(Driver driver, String sql) throws QueryFailedException { + int retryCount = 1; // # of times to retry if first attempt fails + for (int attempt = 0; attempt <= retryCount; ++attempt) { + try { + driver.run(sql); + return true; + } catch (CommandNeedRetryException e) { + if (attempt == retryCount) { + throw new QueryFailedException(sql, e); + } + continue; + } + } // for + return false; + } + +} diff --git a/code/flume-ng-sinks/flume-hive-sink/src/test/resources/log4j.properties b/code/flume-ng-sinks/flume-hive-sink/src/test/resources/log4j.properties new file mode 100644 index 0000000..252b5ea --- /dev/null +++ b/code/flume-ng-sinks/flume-hive-sink/src/test/resources/log4j.properties @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +log4j.rootLogger = INFO, out + +log4j.appender.out = org.apache.log4j.ConsoleAppender +log4j.appender.out.layout = org.apache.log4j.PatternLayout +log4j.appender.out.layout.ConversionPattern = %d (%t) [%p - %l] %m%n + +log4j.logger.org.apache.flume = DEBUG +log4j.logger.org.apache.hadoop = WARN +log4j.logger.org.mortbay = WARN diff --git a/code/flume-ng-sinks/flume-irc-sink/pom.xml b/code/flume-ng-sinks/flume-irc-sink/pom.xml new file mode 100644 index 0000000..6345f59 --- /dev/null +++ b/code/flume-ng-sinks/flume-irc-sink/pom.xml @@ -0,0 +1,83 @@ + + + + + 4.0.0 + + + flume-ng-sinks + org.apache.flume + 1.7.0 + + + org.apache.flume.flume-ng-sinks + flume-irc-sink + Flume NG IRC Sink + + + + + org.apache.rat + apache-rat-plugin + + + + + + + + org.apache.flume + flume-ng-sdk + + + + org.apache.flume + flume-ng-configuration + + + + org.apache.flume + flume-ng-core + + + + org.slf4j + slf4j-api + + + + org.schwering + irclib + + + + junit + junit + test + + + + org.slf4j + slf4j-log4j12 + test + + + + + diff --git a/code/flume-ng-sinks/flume-irc-sink/src/main/java/org/apache/flume/sink/irc/IRCSink.java b/code/flume-ng-sinks/flume-irc-sink/src/main/java/org/apache/flume/sink/irc/IRCSink.java new file mode 100644 index 0000000..52bbfc8 --- /dev/null +++ b/code/flume-ng-sinks/flume-irc-sink/src/main/java/org/apache/flume/sink/irc/IRCSink.java @@ -0,0 +1,266 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.irc; + +import java.io.IOException; + +import org.apache.flume.Channel; +import org.apache.flume.ChannelException; +import org.apache.flume.Context; +import org.apache.flume.CounterGroup; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.conf.Configurable; +import org.apache.flume.sink.AbstractSink; +import org.schwering.irc.lib.IRCConnection; +import org.schwering.irc.lib.IRCEventListener; +import org.schwering.irc.lib.IRCModeParser; +import org.schwering.irc.lib.IRCUser; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; + +public class IRCSink extends AbstractSink implements Configurable { + + private static final Logger logger = LoggerFactory.getLogger(IRCSink.class); + + private static final int DEFAULT_PORT = 6667; + private static final String DEFAULT_SPLIT_CHARS = "\n"; + + private static final String IRC_CHANNEL_PREFIX = "#"; + + private IRCConnection connection = null; + + private String hostname; + private Integer port; + private String nick; + private String password; + private String user; + private String name; + private String chan; + private Boolean splitLines; + private String splitChars; + + private CounterGroup counterGroup; + + public static class IRCConnectionListener implements IRCEventListener { + + public void onRegistered() { + } + + public void onDisconnected() { + logger.error("IRC sink disconnected"); + } + + public void onError(String msg) { + logger.error("IRC sink error: {}", msg); + } + + public void onError(int num, String msg) { + logger.error("IRC sink error: {} - {}", num, msg); + } + + public void onInvite(String chan, IRCUser u, String nickPass) { + } + + public void onJoin(String chan, IRCUser u) { + } + + public void onKick(String chan, IRCUser u, String nickPass, String msg) { + } + + public void onMode(IRCUser u, String nickPass, String mode) { + } + + public void onMode(String chan, IRCUser u, IRCModeParser mp) { + } + + public void onNick(IRCUser u, String nickNew) { + } + + public void onNotice(String target, IRCUser u, String msg) { + } + + public void onPart(String chan, IRCUser u, String msg) { + } + + public void onPrivmsg(String chan, IRCUser u, String msg) { + } + + public void onQuit(IRCUser u, String msg) { + } + + public void onReply(int num, String value, String msg) { + } + + public void onTopic(String chan, IRCUser u, String topic) { + } + + public void onPing(String p) { + } + + public void unknown(String a, String b, String c, String d) { + } + } + + public IRCSink() { + counterGroup = new CounterGroup(); + } + + public void configure(Context context) { + hostname = context.getString("hostname"); + String portStr = context.getString("port"); + nick = context.getString("nick"); + password = context.getString("password"); + user = context.getString("user"); + name = context.getString("name"); + chan = context.getString("chan"); + splitLines = context.getBoolean("splitlines", false); + splitChars = context.getString("splitchars"); + + if (portStr != null) { + port = Integer.parseInt(portStr); + } else { + port = DEFAULT_PORT; + } + + if (splitChars == null) { + splitChars = DEFAULT_SPLIT_CHARS; + } + + Preconditions.checkState(hostname != null, "No hostname specified"); + Preconditions.checkState(nick != null, "No nick specified"); + Preconditions.checkState(chan != null, "No chan specified"); + } + + private void createConnection() throws IOException { + if (connection == null) { + logger.debug( + "Creating new connection to hostname:{} port:{}", + hostname, port); + connection = new IRCConnection(hostname, new int[] { port }, + password, nick, user, name); + connection.addIRCEventListener(new IRCConnectionListener()); + connection.setEncoding("UTF-8"); + connection.setPong(true); + connection.setDaemon(false); + connection.setColors(false); + connection.connect(); + connection.send("join " + IRC_CHANNEL_PREFIX + chan); + } + } + + private void destroyConnection() { + if (connection != null) { + logger.debug("Destroying connection to: {}:{}", hostname, port); + connection.close(); + } + + connection = null; + } + + @Override + public void start() { + logger.info("IRC sink starting"); + + try { + createConnection(); + } catch (Exception e) { + logger.error("Unable to create irc client using hostname:" + + hostname + " port:" + port + ". Exception follows.", e); + + /* Try to prevent leaking resources. */ + destroyConnection(); + + /* FIXME: Mark ourselves as failed. */ + return; + } + + super.start(); + + logger.debug("IRC sink {} started", this.getName()); + } + + @Override + public void stop() { + logger.info("IRC sink {} stopping", this.getName()); + + destroyConnection(); + + super.stop(); + + logger.debug("IRC sink {} stopped. Metrics:{}", this.getName(), counterGroup); + } + + private void sendLine(Event event) { + String body = new String(event.getBody()); + + if (splitLines) { + String[] lines = body.split(splitChars); + for (String line: lines) { + connection.doPrivmsg(IRC_CHANNEL_PREFIX + this.chan, line); + } + } else { + connection.doPrivmsg(IRC_CHANNEL_PREFIX + this.chan, body); + } + + } + + @Override + public Status process() throws EventDeliveryException { + Status status = Status.READY; + Channel channel = getChannel(); + Transaction transaction = channel.getTransaction(); + + try { + transaction.begin(); + createConnection(); + + Event event = channel.take(); + + if (event == null) { + counterGroup.incrementAndGet("event.empty"); + status = Status.BACKOFF; + } else { + sendLine(event); + counterGroup.incrementAndGet("event.irc"); + } + + transaction.commit(); + + } catch (ChannelException e) { + transaction.rollback(); + logger.error( + "Unable to get event from channel. Exception follows.", e); + status = Status.BACKOFF; + } catch (Exception e) { + transaction.rollback(); + logger.error( + "Unable to communicate with IRC server. Exception follows.", + e); + status = Status.BACKOFF; + destroyConnection(); + } finally { + transaction.close(); + } + + return status; + } +} diff --git a/code/flume-ng-sinks/flume-irc-sink/src/test/java/org/apache/flume/sink/irc/TestIRCSink.java b/code/flume-ng-sinks/flume-irc-sink/src/test/java/org/apache/flume/sink/irc/TestIRCSink.java new file mode 100644 index 0000000..32517d1 --- /dev/null +++ b/code/flume-ng-sinks/flume-irc-sink/src/test/java/org/apache/flume/sink/irc/TestIRCSink.java @@ -0,0 +1,166 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flume.sink.irc; + +import org.apache.commons.io.FileUtils; +import org.apache.commons.io.IOUtils; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Sink; +import org.apache.flume.Transaction; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.conf.Configurables; +import org.apache.flume.event.EventBuilder; +import org.junit.After; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.net.ServerSocket; +import java.net.Socket; +import java.util.List; +import java.util.UUID; + +import static org.junit.Assert.fail; + +public class TestIRCSink { + + private File eventFile; + int ircServerPort; + DumbIRCServer dumbIRCServer; + @Rule + public TemporaryFolder folder = new TemporaryFolder(); + + private static int findFreePort() throws IOException { + ServerSocket socket = new ServerSocket(0); + int port = socket.getLocalPort(); + socket.close(); + return port; + } + + @Before + public void setUp() throws IOException { + ircServerPort = findFreePort(); + dumbIRCServer = new DumbIRCServer(ircServerPort); + dumbIRCServer.start(); + eventFile = folder.newFile("eventFile.txt"); + } + + @After + public void tearDown() throws Exception { + dumbIRCServer.shutdownServer(); + } + + @Test + public void testIRCSinkMissingSplitLineProperty() { + Sink ircSink = new IRCSink(); + ircSink.setName("IRC Sink - " + UUID.randomUUID().toString()); + Context context = new Context(); + context.put("hostname", "localhost"); + context.put("port", String.valueOf(ircServerPort)); + context.put("nick", "flume"); + context.put("password", "flume"); + context.put("user", "flume"); + context.put("name", "flume-dev"); + context.put("chan", "flume"); + context.put("splitchars", "false"); + Configurables.configure(ircSink, context); + Channel memoryChannel = new MemoryChannel(); + Configurables.configure(memoryChannel, context); + ircSink.setChannel(memoryChannel); + ircSink.start(); + Transaction txn = memoryChannel.getTransaction(); + txn.begin(); + Event event = EventBuilder.withBody("Dummy Event".getBytes()); + memoryChannel.put(event); + txn.commit(); + txn.close(); + try { + Sink.Status status = ircSink.process(); + if (status == Sink.Status.BACKOFF) { + fail("Error occured"); + } + } catch (EventDeliveryException eDelExcp) { + // noop + } + } + + class DumbIRCServer extends Thread { + int port; + ServerSocket ss; + + public DumbIRCServer(int port) { + this.port = port; + } + + public void run() { + try { + ss = new ServerSocket(port); + while (true) { + try { + Socket socket = ss.accept(); + process(socket); + } catch (Exception ex) { + /* noop */ + } + } + } catch (IOException e) { + // noop + } + } + + public void shutdownServer() throws Exception { + ss.close(); + } + + /** + * Process the incoming request from IRC client + * + * @param socket IRC client connection socket + * @throws IOException + */ + private void process(Socket socket) throws IOException { + FileOutputStream fileOutputStream = FileUtils.openOutputStream(eventFile); + List input = IOUtils.readLines(socket.getInputStream()); + for (String next : input) { + if (isPrivMessage(next)) { + fileOutputStream.write(next.getBytes()); + fileOutputStream.write("\n".getBytes()); + } + } + fileOutputStream.close(); + socket.close(); + } + + /** + * Checks if the message is Priv message + * + * @param input command received from IRC client + * @return true, if command received is PrivMessage + */ + private boolean isPrivMessage(String input) { + return input.startsWith("PRIVMSG"); + } + } +} \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/pom.xml b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/pom.xml new file mode 100644 index 0000000..527bcca --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/pom.xml @@ -0,0 +1,93 @@ + + + + + 4.0.0 + + + flume-ng-sinks + org.apache.flume + 1.7.0 + + + org.apache.flume.flume-ng-sinks + flume-ng-elasticsearch-sink + Flume NG ElasticSearch Sink + + + + + org.apache.rat + apache-rat-plugin + + + + + + + + org.apache.flume + flume-ng-sdk + + + + org.apache.flume + flume-ng-core + + + + org.slf4j + slf4j-api + + + + org.elasticsearch + elasticsearch + true + + + + org.apache.httpcomponents + httpclient + + + + junit + junit + test + + + + org.slf4j + slf4j-log4j12 + test + + + + commons-lang + commons-lang + + + + com.google.guava + guava + + + + org.mockito + mockito-all + test + + + + diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchIndexRequestBuilderFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchIndexRequestBuilderFactory.java new file mode 100644 index 0000000..754155c --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchIndexRequestBuilderFactory.java @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import java.io.IOException; + +import org.apache.commons.lang.time.FastDateFormat; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.apache.flume.conf.Configurable; +import org.apache.flume.conf.ConfigurableComponent; +import org.apache.flume.formatter.output.BucketPath; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Client; + +import com.google.common.annotations.VisibleForTesting; + +/** + * Abstract base class for custom implementations of + * {@link ElasticSearchIndexRequestBuilderFactory}. + */ +public abstract class AbstractElasticSearchIndexRequestBuilderFactory + implements ElasticSearchIndexRequestBuilderFactory { + + /** + * {@link FastDateFormat} to use for index names + * in {@link #getIndexName(String, long)} + */ + protected final FastDateFormat fastDateFormat; + + /** + * Constructor for subclasses + * @param fastDateFormat {@link FastDateFormat} to use for index names + */ + protected AbstractElasticSearchIndexRequestBuilderFactory(FastDateFormat fastDateFormat) { + this.fastDateFormat = fastDateFormat; + } + + /** + * @see Configurable + */ + @Override + public abstract void configure(Context arg0); + + /** + * @see ConfigurableComponent + */ + @Override + public abstract void configure(ComponentConfiguration arg0); + + /** + * Creates and prepares an {@link IndexRequestBuilder} from the supplied + * {@link Client} via delegation to the subclass-hook template methods + * {@link #getIndexName(String, long)} and + * {@link #prepareIndexRequest(IndexRequestBuilder, String, String, Event)} + */ + @Override + public IndexRequestBuilder createIndexRequest(Client client, + String indexPrefix, String indexType, Event event) throws IOException { + IndexRequestBuilder request = prepareIndex(client); + String realIndexPrefix = BucketPath.escapeString(indexPrefix, event.getHeaders()); + String realIndexType = BucketPath.escapeString(indexType, event.getHeaders()); + + TimestampedEvent timestampedEvent = new TimestampedEvent(event); + long timestamp = timestampedEvent.getTimestamp(); + + String indexName = getIndexName(realIndexPrefix, timestamp); + prepareIndexRequest(request, indexName, realIndexType, timestampedEvent); + return request; + } + + @VisibleForTesting + IndexRequestBuilder prepareIndex(Client client) { + return client.prepareIndex(); + } + + /** + * Gets the name of the index to use for an index request + * @param indexPrefix + * Prefix of index name to use -- as configured on the sink + * @param timestamp + * timestamp (millis) to format / use + * @return index name of the form 'indexPrefix-formattedTimestamp' + */ + protected String getIndexName(String indexPrefix, long timestamp) { + return new StringBuilder(indexPrefix).append('-') + .append(fastDateFormat.format(timestamp)).toString(); + } + + /** + * Prepares an ElasticSearch {@link IndexRequestBuilder} instance + * @param indexRequest + * The (empty) ElasticSearch {@link IndexRequestBuilder} to prepare + * @param indexName + * Index name to use -- as per {@link #getIndexName(String, long)} + * @param indexType + * Index type to use -- as configured on the sink + * @param event + * Flume event to serialize and add to index request + * @throws IOException + * If an error occurs e.g. during serialization + */ + protected abstract void prepareIndexRequest( + IndexRequestBuilder indexRequest, String indexName, + String indexType, Event event) throws IOException; + +} \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java new file mode 100644 index 0000000..83c3ffd --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; +import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; + +import java.io.IOException; +import java.nio.charset.Charset; + +import org.elasticsearch.common.jackson.core.JsonParseException; +import org.elasticsearch.common.xcontent.XContentBuilder; +import org.elasticsearch.common.xcontent.XContentFactory; +import org.elasticsearch.common.xcontent.XContentParser; +import org.elasticsearch.common.xcontent.XContentType; + +/** + * Utility methods for using ElasticSearch {@link XContentBuilder} + */ +public class ContentBuilderUtil { + + private static final Charset charset = Charset.defaultCharset(); + + private ContentBuilderUtil() { + } + + public static void appendField(XContentBuilder builder, String field, + byte[] data) throws IOException { + XContentType contentType = XContentFactory.xContentType(data); + if (contentType == null) { + addSimpleField(builder, field, data); + } else { + addComplexField(builder, field, contentType, data); + } + } + + public static void addSimpleField(XContentBuilder builder, String fieldName, + byte[] data) throws IOException { + builder.field(fieldName, new String(data, charset)); + } + + public static void addComplexField(XContentBuilder builder, String fieldName, + XContentType contentType, byte[] data) throws IOException { + XContentParser parser = null; + try { + // Elasticsearch will accept JSON directly but we need to validate that + // the incoming event is JSON first. Sadly, the elasticsearch JSON parser + // is a stream parser so we need to instantiate it, parse the event to + // validate it, then instantiate it again to provide the JSON to + // elasticsearch. + // If validation fails then the incoming event is submitted to + // elasticsearch as plain text. + parser = XContentFactory.xContent(contentType).createParser(data); + while (parser.nextToken() != null) {}; + + // If the JSON is valid then include it + parser = XContentFactory.xContent(contentType).createParser(data); + // Add the field name, but not the value. + builder.field(fieldName); + // This will add the whole parsed content as the value of the field. + builder.copyCurrentStructure(parser); + } catch (JsonParseException ex) { + // If we get an exception here the most likely cause is nested JSON that + // can't be figured out in the body. At this point just push it through + // as is + addSimpleField(builder, fieldName, data); + } finally { + if (parser != null) { + parser.close(); + } + } + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchDynamicSerializer.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchDynamicSerializer.java new file mode 100644 index 0000000..aa7ad39 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchDynamicSerializer.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; + +import java.io.IOException; +import java.util.Map; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.elasticsearch.common.xcontent.XContentBuilder; + +/** + * Basic serializer that serializes the event body and header fields into + * individual fields

+ * + * A best effort will be used to determine the content-type, if it cannot be + * determined fields will be indexed as Strings + */ +public class ElasticSearchDynamicSerializer implements + ElasticSearchEventSerializer { + + @Override + public void configure(Context context) { + // NO-OP... + } + + @Override + public void configure(ComponentConfiguration conf) { + // NO-OP... + } + + @Override + public XContentBuilder getContentBuilder(Event event) throws IOException { + XContentBuilder builder = jsonBuilder().startObject(); + appendBody(builder, event); + appendHeaders(builder, event); + return builder; + } + + private void appendBody(XContentBuilder builder, Event event) + throws IOException { + ContentBuilderUtil.appendField(builder, "body", event.getBody()); + } + + private void appendHeaders(XContentBuilder builder, Event event) + throws IOException { + Map headers = event.getHeaders(); + for (String key : headers.keySet()) { + ContentBuilderUtil.appendField(builder, key, + headers.get(key).getBytes(charset)); + } + } + +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchEventSerializer.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchEventSerializer.java new file mode 100644 index 0000000..c89d627 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchEventSerializer.java @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import java.io.IOException; +import java.nio.charset.Charset; + +import org.apache.flume.Event; +import org.apache.flume.conf.Configurable; +import org.apache.flume.conf.ConfigurableComponent; +import org.elasticsearch.common.io.BytesStream; + +/** + * Interface for an event serializer which serializes the headers and body of an + * event to write them to ElasticSearch. This is configurable, so any config + * params required should be taken through this. + */ +public interface ElasticSearchEventSerializer extends Configurable, + ConfigurableComponent { + + public static final Charset charset = Charset.defaultCharset(); + + /** + * Return an {@link BytesStream} made up of the serialized flume event + * @param event + * The flume event to serialize + * @return A {@link BytesStream} used to write to ElasticSearch + * @throws IOException + * If an error occurs during serialization + */ + abstract BytesStream getContentBuilder(Event event) throws IOException; +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchIndexRequestBuilderFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchIndexRequestBuilderFactory.java new file mode 100644 index 0000000..f76308c --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchIndexRequestBuilderFactory.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.commons.lang.time.FastDateFormat; +import org.apache.flume.Event; +import org.apache.flume.conf.Configurable; +import org.apache.flume.conf.ConfigurableComponent; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Client; + +import java.io.IOException; +import java.util.TimeZone; + +/** + * Interface for creating ElasticSearch {@link IndexRequestBuilder} instances + * from serialized flume events. This is configurable, so any config params + * required should be taken through this. + */ +public interface ElasticSearchIndexRequestBuilderFactory extends Configurable, + ConfigurableComponent { + + static final FastDateFormat df = FastDateFormat.getInstance("yyyy-MM-dd", + TimeZone.getTimeZone("Etc/UTC")); + + /** + * @param client + * ElasticSearch {@link Client} to prepare index from + * @param indexPrefix + * Prefix of index name to use -- as configured on the sink + * @param indexType + * Index type to use -- as configured on the sink + * @param event + * Flume event to serialize and add to index request + * @return prepared ElasticSearch {@link IndexRequestBuilder} instance + * @throws IOException + * If an error occurs e.g. during serialization + */ + IndexRequestBuilder createIndexRequest(Client client, String indexPrefix, + String indexType, Event event) throws IOException; + + + +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchLogStashEventSerializer.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchLogStashEventSerializer.java new file mode 100644 index 0000000..3638368 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchLogStashEventSerializer.java @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; + +import java.io.IOException; +import java.io.UnsupportedEncodingException; +import java.util.Date; +import java.util.Map; + +import org.apache.commons.lang.StringUtils; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.elasticsearch.common.collect.Maps; +import org.elasticsearch.common.xcontent.XContentBuilder; + +/** + * Serialize flume events into the same format LogStash uses

+ * + * This can be used to send events to ElasticSearch and use clients such as + * Kabana which expect Logstash formated indexes + * + *
+ * {
+ *    "@timestamp": "2010-12-21T21:48:33.309258Z",
+ *    "@tags": [ "array", "of", "tags" ],
+ *    "@type": "string",
+ *    "@source": "source of the event, usually a URL."
+ *    "@source_host": ""
+ *    "@source_path": ""
+ *    "@fields":{
+ *       # a set of fields for this event
+ *       "user": "jordan",
+ *       "command": "shutdown -r":
+ *     }
+ *     "@message": "the original plain-text message"
+ *   }
+ * 
+ * + * If the following headers are present, they will map to the above logstash + * output as long as the logstash fields are not already present.

+ * + *
+ *  timestamp: long -> @timestamp:Date
+ *  host: String -> @source_host: String
+ *  src_path: String -> @source_path: String
+ *  type: String -> @type: String
+ *  source: String -> @source: String
+ * 
+ * + * @see https + * ://github.com/logstash/logstash/wiki/logstash%27s-internal-message- + * format + */ +public class ElasticSearchLogStashEventSerializer implements + ElasticSearchEventSerializer { + + @Override + public XContentBuilder getContentBuilder(Event event) throws IOException { + XContentBuilder builder = jsonBuilder().startObject(); + appendBody(builder, event); + appendHeaders(builder, event); + return builder; + } + + private void appendBody(XContentBuilder builder, Event event) + throws IOException, UnsupportedEncodingException { + byte[] body = event.getBody(); + ContentBuilderUtil.appendField(builder, "@message", body); + } + + private void appendHeaders(XContentBuilder builder, Event event) + throws IOException { + Map headers = Maps.newHashMap(event.getHeaders()); + + String timestamp = headers.get("timestamp"); + if (!StringUtils.isBlank(timestamp) + && StringUtils.isBlank(headers.get("@timestamp"))) { + long timestampMs = Long.parseLong(timestamp); + builder.field("@timestamp", new Date(timestampMs)); + } + + String source = headers.get("source"); + if (!StringUtils.isBlank(source) + && StringUtils.isBlank(headers.get("@source"))) { + ContentBuilderUtil.appendField(builder, "@source", + source.getBytes(charset)); + } + + String type = headers.get("type"); + if (!StringUtils.isBlank(type) + && StringUtils.isBlank(headers.get("@type"))) { + ContentBuilderUtil.appendField(builder, "@type", type.getBytes(charset)); + } + + String host = headers.get("host"); + if (!StringUtils.isBlank(host) + && StringUtils.isBlank(headers.get("@source_host"))) { + ContentBuilderUtil.appendField(builder, "@source_host", + host.getBytes(charset)); + } + + String srcPath = headers.get("src_path"); + if (!StringUtils.isBlank(srcPath) + && StringUtils.isBlank(headers.get("@source_path"))) { + ContentBuilderUtil.appendField(builder, "@source_path", + srcPath.getBytes(charset)); + } + + builder.startObject("@fields"); + for (String key : headers.keySet()) { + byte[] val = headers.get(key).getBytes(charset); + ContentBuilderUtil.appendField(builder, key, val); + } + builder.endObject(); + } + + @Override + public void configure(Context context) { + // NO-OP... + } + + @Override + public void configure(ComponentConfiguration conf) { + // NO-OP... + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSink.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSink.java new file mode 100644 index 0000000..ebafb9f --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSink.java @@ -0,0 +1,428 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.BATCH_SIZE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.CLUSTER_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_CLUSTER_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_INDEX_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_INDEX_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_TTL; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.HOSTNAMES; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.SERIALIZER; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.SERIALIZER_PREFIX; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.TTL; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.TTL_REGEX; +import org.apache.commons.lang.StringUtils; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.CounterGroup; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.Transaction; +import org.apache.flume.formatter.output.BucketPath; +import org.apache.flume.conf.Configurable; +import org.apache.flume.instrumentation.SinkCounter; +import org.apache.flume.sink.AbstractSink; +import org.apache.flume.sink.elasticsearch.client.ElasticSearchClient; +import org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; + +import java.util.concurrent.TimeUnit; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.CLIENT_PREFIX; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.CLIENT_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_CLIENT_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_INDEX_NAME_BUILDER_CLASS; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_SERIALIZER_CLASS; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_NAME_BUILDER; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_NAME_BUILDER_PREFIX; + +/** + * A sink which reads events from a channel and writes them to ElasticSearch + * based on the work done by https://github.com/Aconex/elasticflume.git.

+ * + * This sink supports batch reading of events from the channel and writing them + * to ElasticSearch.

+ * + * Indexes will be rolled daily using the format 'indexname-YYYY-MM-dd' to allow + * easier management of the index

+ * + * This sink must be configured with with mandatory parameters detailed in + * {@link ElasticSearchSinkConstants}

It is recommended as a secondary step + * the ElasticSearch indexes are optimized for the specified serializer. This is + * not handled by the sink but is typically done by deploying a config template + * alongside the ElasticSearch deploy

+ * + * @see http + * ://www.elasticsearch.org/guide/reference/api/admin-indices-templates. + * html + */ +public class ElasticSearchSink extends AbstractSink implements Configurable { + + private static final Logger logger = LoggerFactory + .getLogger(ElasticSearchSink.class); + + // Used for testing + private boolean isLocal = false; + private final CounterGroup counterGroup = new CounterGroup(); + + private static final int defaultBatchSize = 100; + + private int batchSize = defaultBatchSize; + private long ttlMs = DEFAULT_TTL; + private String clusterName = DEFAULT_CLUSTER_NAME; + private String indexName = DEFAULT_INDEX_NAME; + private String indexType = DEFAULT_INDEX_TYPE; + private String clientType = DEFAULT_CLIENT_TYPE; + private final Pattern pattern = Pattern.compile(TTL_REGEX, + Pattern.CASE_INSENSITIVE); + private Matcher matcher = pattern.matcher(""); + + private String[] serverAddresses = null; + + private ElasticSearchClient client = null; + private Context elasticSearchClientContext = null; + + private ElasticSearchIndexRequestBuilderFactory indexRequestFactory; + private ElasticSearchEventSerializer eventSerializer; + private IndexNameBuilder indexNameBuilder; + private SinkCounter sinkCounter; + + /** + * Create an {@link ElasticSearchSink} configured using the supplied + * configuration + */ + public ElasticSearchSink() { + this(false); + } + + /** + * Create an {@link ElasticSearchSink}

+ * + * @param isLocal + * If true sink will be configured to only talk to an + * ElasticSearch instance hosted in the same JVM, should always be + * false is production + * + */ + @VisibleForTesting + ElasticSearchSink(boolean isLocal) { + this.isLocal = isLocal; + } + + @VisibleForTesting + String[] getServerAddresses() { + return serverAddresses; + } + + @VisibleForTesting + String getClusterName() { + return clusterName; + } + + @VisibleForTesting + String getIndexName() { + return indexName; + } + + @VisibleForTesting + String getIndexType() { + return indexType; + } + + @VisibleForTesting + long getTTLMs() { + return ttlMs; + } + + @VisibleForTesting + ElasticSearchEventSerializer getEventSerializer() { + return eventSerializer; + } + + @VisibleForTesting + IndexNameBuilder getIndexNameBuilder() { + return indexNameBuilder; + } + + @Override + public Status process() throws EventDeliveryException { + logger.debug("processing..."); + Status status = Status.READY; + Channel channel = getChannel(); + Transaction txn = channel.getTransaction(); + try { + txn.begin(); + int count; + for (count = 0; count < batchSize; ++count) { + Event event = channel.take(); + + if (event == null) { + break; + } + String realIndexType = BucketPath.escapeString(indexType, event.getHeaders()); + client.addEvent(event, indexNameBuilder, realIndexType, ttlMs); + } + + if (count <= 0) { + sinkCounter.incrementBatchEmptyCount(); + counterGroup.incrementAndGet("channel.underflow"); + status = Status.BACKOFF; + } else { + if (count < batchSize) { + sinkCounter.incrementBatchUnderflowCount(); + status = Status.BACKOFF; + } else { + sinkCounter.incrementBatchCompleteCount(); + } + + sinkCounter.addToEventDrainAttemptCount(count); + client.execute(); + } + txn.commit(); + sinkCounter.addToEventDrainSuccessCount(count); + counterGroup.incrementAndGet("transaction.success"); + } catch (Throwable ex) { + try { + txn.rollback(); + counterGroup.incrementAndGet("transaction.rollback"); + } catch (Exception ex2) { + logger.error( + "Exception in rollback. Rollback might not have been successful.", + ex2); + } + + if (ex instanceof Error || ex instanceof RuntimeException) { + logger.error("Failed to commit transaction. Transaction rolled back.", + ex); + Throwables.propagate(ex); + } else { + logger.error("Failed to commit transaction. Transaction rolled back.", + ex); + throw new EventDeliveryException( + "Failed to commit transaction. Transaction rolled back.", ex); + } + } finally { + txn.close(); + } + return status; + } + + @Override + public void configure(Context context) { + if (!isLocal) { + if (StringUtils.isNotBlank(context.getString(HOSTNAMES))) { + serverAddresses = StringUtils.deleteWhitespace( + context.getString(HOSTNAMES)).split(","); + } + Preconditions.checkState(serverAddresses != null + && serverAddresses.length > 0, "Missing Param:" + HOSTNAMES); + } + + if (StringUtils.isNotBlank(context.getString(INDEX_NAME))) { + this.indexName = context.getString(INDEX_NAME); + } + + if (StringUtils.isNotBlank(context.getString(INDEX_TYPE))) { + this.indexType = context.getString(INDEX_TYPE); + } + + if (StringUtils.isNotBlank(context.getString(CLUSTER_NAME))) { + this.clusterName = context.getString(CLUSTER_NAME); + } + + if (StringUtils.isNotBlank(context.getString(BATCH_SIZE))) { + this.batchSize = Integer.parseInt(context.getString(BATCH_SIZE)); + } + + if (StringUtils.isNotBlank(context.getString(TTL))) { + this.ttlMs = parseTTL(context.getString(TTL)); + Preconditions.checkState(ttlMs > 0, TTL + + " must be greater than 0 or not set."); + } + + if (StringUtils.isNotBlank(context.getString(CLIENT_TYPE))) { + clientType = context.getString(CLIENT_TYPE); + } + + elasticSearchClientContext = new Context(); + elasticSearchClientContext.putAll(context.getSubProperties(CLIENT_PREFIX)); + + String serializerClazz = DEFAULT_SERIALIZER_CLASS; + if (StringUtils.isNotBlank(context.getString(SERIALIZER))) { + serializerClazz = context.getString(SERIALIZER); + } + + Context serializerContext = new Context(); + serializerContext.putAll(context.getSubProperties(SERIALIZER_PREFIX)); + + try { + @SuppressWarnings("unchecked") + Class clazz = (Class) Class + .forName(serializerClazz); + Configurable serializer = clazz.newInstance(); + + if (serializer instanceof ElasticSearchIndexRequestBuilderFactory) { + indexRequestFactory + = (ElasticSearchIndexRequestBuilderFactory) serializer; + indexRequestFactory.configure(serializerContext); + } else if (serializer instanceof ElasticSearchEventSerializer) { + eventSerializer = (ElasticSearchEventSerializer) serializer; + eventSerializer.configure(serializerContext); + } else { + throw new IllegalArgumentException(serializerClazz + + " is not an ElasticSearchEventSerializer"); + } + } catch (Exception e) { + logger.error("Could not instantiate event serializer.", e); + Throwables.propagate(e); + } + + if (sinkCounter == null) { + sinkCounter = new SinkCounter(getName()); + } + + String indexNameBuilderClass = DEFAULT_INDEX_NAME_BUILDER_CLASS; + if (StringUtils.isNotBlank(context.getString(INDEX_NAME_BUILDER))) { + indexNameBuilderClass = context.getString(INDEX_NAME_BUILDER); + } + + Context indexnameBuilderContext = new Context(); + serializerContext.putAll( + context.getSubProperties(INDEX_NAME_BUILDER_PREFIX)); + + try { + @SuppressWarnings("unchecked") + Class clazz + = (Class) Class + .forName(indexNameBuilderClass); + indexNameBuilder = clazz.newInstance(); + indexnameBuilderContext.put(INDEX_NAME, indexName); + indexNameBuilder.configure(indexnameBuilderContext); + } catch (Exception e) { + logger.error("Could not instantiate index name builder.", e); + Throwables.propagate(e); + } + + if (sinkCounter == null) { + sinkCounter = new SinkCounter(getName()); + } + + Preconditions.checkState(StringUtils.isNotBlank(indexName), + "Missing Param:" + INDEX_NAME); + Preconditions.checkState(StringUtils.isNotBlank(indexType), + "Missing Param:" + INDEX_TYPE); + Preconditions.checkState(StringUtils.isNotBlank(clusterName), + "Missing Param:" + CLUSTER_NAME); + Preconditions.checkState(batchSize >= 1, BATCH_SIZE + + " must be greater than 0"); + } + + @Override + public void start() { + ElasticSearchClientFactory clientFactory = new ElasticSearchClientFactory(); + + logger.info("ElasticSearch sink {} started"); + sinkCounter.start(); + try { + if (isLocal) { + client = clientFactory.getLocalClient( + clientType, eventSerializer, indexRequestFactory); + } else { + client = clientFactory.getClient(clientType, serverAddresses, + clusterName, eventSerializer, indexRequestFactory); + client.configure(elasticSearchClientContext); + } + sinkCounter.incrementConnectionCreatedCount(); + } catch (Exception ex) { + ex.printStackTrace(); + sinkCounter.incrementConnectionFailedCount(); + if (client != null) { + client.close(); + sinkCounter.incrementConnectionClosedCount(); + } + } + + super.start(); + } + + @Override + public void stop() { + logger.info("ElasticSearch sink {} stopping"); + if (client != null) { + client.close(); + } + sinkCounter.incrementConnectionClosedCount(); + sinkCounter.stop(); + super.stop(); + } + + /* + * Returns TTL value of ElasticSearch index in milliseconds when TTL specifier + * is "ms" / "s" / "m" / "h" / "d" / "w". In case of unknown specifier TTL is + * not set. When specifier is not provided it defaults to days in milliseconds + * where the number of days is parsed integer from TTL string provided by + * user.

Elasticsearch supports ttl values being provided in the format: + * 1d / 1w / 1ms / 1s / 1h / 1m specify a time unit like d (days), m + * (minutes), h (hours), ms (milliseconds) or w (weeks), milliseconds is used + * as default unit. + * http://www.elasticsearch.org/guide/reference/mapping/ttl-field/. + * + * @param ttl TTL value provided by user in flume configuration file for the + * sink + * + * @return the ttl value in milliseconds + */ + private long parseTTL(String ttl) { + matcher = matcher.reset(ttl); + while (matcher.find()) { + if (matcher.group(2).equals("ms")) { + return Long.parseLong(matcher.group(1)); + } else if (matcher.group(2).equals("s")) { + return TimeUnit.SECONDS.toMillis(Integer.parseInt(matcher.group(1))); + } else if (matcher.group(2).equals("m")) { + return TimeUnit.MINUTES.toMillis(Integer.parseInt(matcher.group(1))); + } else if (matcher.group(2).equals("h")) { + return TimeUnit.HOURS.toMillis(Integer.parseInt(matcher.group(1))); + } else if (matcher.group(2).equals("d")) { + return TimeUnit.DAYS.toMillis(Integer.parseInt(matcher.group(1))); + } else if (matcher.group(2).equals("w")) { + return TimeUnit.DAYS.toMillis(7 * Integer.parseInt(matcher.group(1))); + } else if (matcher.group(2).equals("")) { + logger.info("TTL qualifier is empty. Defaulting to day qualifier."); + return TimeUnit.DAYS.toMillis(Integer.parseInt(matcher.group(1))); + } else { + logger.debug("Unknown TTL qualifier provided. Setting TTL to 0."); + return 0; + } + } + logger.info("TTL not provided. Skipping the TTL config by returning 0."); + return 0; + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSinkConstants.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSinkConstants.java new file mode 100644 index 0000000..da88def --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSinkConstants.java @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +public class ElasticSearchSinkConstants { + + /** + * Comma separated list of hostname:port, if the port is not present the + * default port '9300' will be used

+ * Example: + *
+   *  127.0.0.1:92001,127.0.0.2:9300
+   * 
+ */ + public static final String HOSTNAMES = "hostNames"; + + /** + * The name to index the document to, defaults to 'flume'

+ * The current date in the format 'yyyy-MM-dd' will be appended to this name, + * for example 'foo' will result in a daily index of 'foo-yyyy-MM-dd' + */ + public static final String INDEX_NAME = "indexName"; + + /** + * The type to index the document to, defaults to 'log' + */ + public static final String INDEX_TYPE = "indexType"; + + /** + * Name of the ElasticSearch cluster to connect to + */ + public static final String CLUSTER_NAME = "clusterName"; + + /** + * Maximum number of events the sink should take from the channel per + * transaction, if available. Defaults to 100 + */ + public static final String BATCH_SIZE = "batchSize"; + + /** + * TTL in days, when set will cause the expired documents to be deleted + * automatically, if not set documents will never be automatically deleted + */ + public static final String TTL = "ttl"; + + /** + * The fully qualified class name of the serializer the sink should use. + */ + public static final String SERIALIZER = "serializer"; + + /** + * Configuration to pass to the serializer. + */ + public static final String SERIALIZER_PREFIX = SERIALIZER + "."; + + /** + * The fully qualified class name of the index name builder the sink + * should use to determine name of index where the event should be sent. + */ + public static final String INDEX_NAME_BUILDER = "indexNameBuilder"; + + /** + * The fully qualified class name of the index name builder the sink + * should use to determine name of index where the event should be sent. + */ + public static final String INDEX_NAME_BUILDER_PREFIX + = INDEX_NAME_BUILDER + "."; + + /** + * The client type used for sending bulks to ElasticSearch + */ + public static final String CLIENT_TYPE = "client"; + + /** + * The client prefix to extract the configuration that will be passed to + * elasticsearch client. + */ + public static final String CLIENT_PREFIX = CLIENT_TYPE + "."; + + /** + * DEFAULTS USED BY THE SINK + */ + + public static final int DEFAULT_PORT = 9300; + public static final int DEFAULT_TTL = -1; + public static final String DEFAULT_INDEX_NAME = "flume"; + public static final String DEFAULT_INDEX_TYPE = "log"; + public static final String DEFAULT_CLUSTER_NAME = "elasticsearch"; + public static final String DEFAULT_CLIENT_TYPE = "transport"; + public static final String TTL_REGEX = "^(\\d+)(\\D*)"; + public static final String DEFAULT_SERIALIZER_CLASS = "org.apache.flume." + + "sink.elasticsearch.ElasticSearchLogStashEventSerializer"; + public static final String DEFAULT_INDEX_NAME_BUILDER_CLASS = + "org.apache.flume.sink.elasticsearch.TimeBasedIndexNameBuilder"; +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/EventSerializerIndexRequestBuilderFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/EventSerializerIndexRequestBuilderFactory.java new file mode 100644 index 0000000..d6cca50 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/EventSerializerIndexRequestBuilderFactory.java @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import java.io.IOException; + +import org.apache.commons.lang.time.FastDateFormat; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.common.io.BytesStream; + +/** + * Default implementation of {@link ElasticSearchIndexRequestBuilderFactory}. + * It serializes flume events using the + * {@link ElasticSearchEventSerializer} instance configured on the sink. + */ +public class EventSerializerIndexRequestBuilderFactory + extends AbstractElasticSearchIndexRequestBuilderFactory { + + protected final ElasticSearchEventSerializer serializer; + + public EventSerializerIndexRequestBuilderFactory( + ElasticSearchEventSerializer serializer) { + this(serializer, ElasticSearchIndexRequestBuilderFactory.df); + } + + protected EventSerializerIndexRequestBuilderFactory( + ElasticSearchEventSerializer serializer, FastDateFormat fdf) { + super(fdf); + this.serializer = serializer; + } + + @Override + public void configure(Context context) { + serializer.configure(context); + } + + @Override + public void configure(ComponentConfiguration config) { + serializer.configure(config); + } + + @Override + protected void prepareIndexRequest(IndexRequestBuilder indexRequest, + String indexName, String indexType, Event event) throws IOException { + BytesStream contentBuilder = serializer.getContentBuilder(event); + indexRequest.setIndex(indexName) + .setType(indexType) + .setSource(contentBuilder.bytes()); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/IndexNameBuilder.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/IndexNameBuilder.java new file mode 100644 index 0000000..1dd4415 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/IndexNameBuilder.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.Event; +import org.apache.flume.conf.Configurable; +import org.apache.flume.conf.ConfigurableComponent; + +public interface IndexNameBuilder extends Configurable, + ConfigurableComponent { + /** + * Gets the name of the index to use for an index request + * @param event + * Event which determines index name + * @return index name of the form 'indexPrefix-indexDynamicName' + */ + public String getIndexName(Event event); + + /** + * Gets the prefix of index to use for an index request. + * @param event + * Event which determines index name + * @return Index prefix name + */ + public String getIndexPrefix(Event event); +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/SimpleIndexNameBuilder.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/SimpleIndexNameBuilder.java new file mode 100644 index 0000000..801cac9 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/SimpleIndexNameBuilder.java @@ -0,0 +1,46 @@ +/* + * Copyright 2014 Apache Software Foundation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.apache.flume.formatter.output.BucketPath; + +public class SimpleIndexNameBuilder implements IndexNameBuilder { + + private String indexName; + + @Override + public String getIndexName(Event event) { + return BucketPath.escapeString(indexName, event.getHeaders()); + } + + @Override + public String getIndexPrefix(Event event) { + return BucketPath.escapeString(indexName, event.getHeaders()); + } + + @Override + public void configure(Context context) { + indexName = context.getString(ElasticSearchSinkConstants.INDEX_NAME); + } + + @Override + public void configure(ComponentConfiguration conf) { + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilder.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilder.java new file mode 100644 index 0000000..c651732 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilder.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang.time.FastDateFormat; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.apache.flume.formatter.output.BucketPath; + +import java.util.TimeZone; + +/** + * Default index name builder. It prepares name of index using configured + * prefix and current timestamp. Default format of name is prefix-yyyy-MM-dd". + */ +public class TimeBasedIndexNameBuilder implements + IndexNameBuilder { + + public static final String DATE_FORMAT = "dateFormat"; + public static final String TIME_ZONE = "timeZone"; + + public static final String DEFAULT_DATE_FORMAT = "yyyy-MM-dd"; + public static final String DEFAULT_TIME_ZONE = "Etc/UTC"; + + private FastDateFormat fastDateFormat = FastDateFormat.getInstance("yyyy-MM-dd", + TimeZone.getTimeZone("Etc/UTC")); + + private String indexPrefix; + + @VisibleForTesting + FastDateFormat getFastDateFormat() { + return fastDateFormat; + } + + /** + * Gets the name of the index to use for an index request + * @param event + * Event for which the name of index has to be prepared + * @return index name of the form 'indexPrefix-formattedTimestamp' + */ + @Override + public String getIndexName(Event event) { + TimestampedEvent timestampedEvent = new TimestampedEvent(event); + long timestamp = timestampedEvent.getTimestamp(); + String realIndexPrefix = BucketPath.escapeString(indexPrefix, event.getHeaders()); + return new StringBuilder(realIndexPrefix).append('-') + .append(fastDateFormat.format(timestamp)).toString(); + } + + @Override + public String getIndexPrefix(Event event) { + return BucketPath.escapeString(indexPrefix, event.getHeaders()); + } + + @Override + public void configure(Context context) { + String dateFormatString = context.getString(DATE_FORMAT); + String timeZoneString = context.getString(TIME_ZONE); + if (StringUtils.isBlank(dateFormatString)) { + dateFormatString = DEFAULT_DATE_FORMAT; + } + if (StringUtils.isBlank(timeZoneString)) { + timeZoneString = DEFAULT_TIME_ZONE; + } + fastDateFormat = FastDateFormat.getInstance(dateFormatString, + TimeZone.getTimeZone(timeZoneString)); + indexPrefix = context.getString(ElasticSearchSinkConstants.INDEX_NAME); + } + + @Override + public void configure(ComponentConfiguration conf) { + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimestampedEvent.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimestampedEvent.java new file mode 100644 index 0000000..c056839 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimestampedEvent.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import com.google.common.collect.Maps; +import org.apache.commons.lang.StringUtils; +import org.apache.flume.Event; +import org.apache.flume.event.SimpleEvent; +import org.joda.time.DateTimeUtils; + +import java.util.Map; + +/** + * {@link org.apache.flume.Event} implementation that has a timestamp. + * The timestamp is taken from (in order of precedence):
    + *
  1. The "timestamp" header of the base event, if present
  2. + *
  3. The "@timestamp" header of the base event, if present
  4. + *
  5. The current time in millis, otherwise
  6. + *
+ */ +final class TimestampedEvent extends SimpleEvent { + + private final long timestamp; + + TimestampedEvent(Event base) { + setBody(base.getBody()); + Map headers = Maps.newHashMap(base.getHeaders()); + String timestampString = headers.get("timestamp"); + if (StringUtils.isBlank(timestampString)) { + timestampString = headers.get("@timestamp"); + } + if (StringUtils.isBlank(timestampString)) { + this.timestamp = DateTimeUtils.currentTimeMillis(); + headers.put("timestamp", String.valueOf(timestamp )); + } else { + this.timestamp = Long.valueOf(timestampString); + } + setHeaders(headers); + } + + long getTimestamp() { + return timestamp; + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClient.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClient.java new file mode 100644 index 0000000..655e00a --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClient.java @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import org.apache.flume.Event; +import org.apache.flume.conf.Configurable; +import org.apache.flume.sink.elasticsearch.IndexNameBuilder; + +/** + * Interface for an ElasticSearch client which is responsible for sending bulks + * of events to ElasticSearch. + */ +public interface ElasticSearchClient extends Configurable { + + /** + * Close connection to elastic search in client + */ + void close(); + + /** + * Add new event to the bulk + * + * @param event + * Flume Event + * @param indexNameBuilder + * Index name builder which generates name of index to feed + * @param indexType + * Name of type of document which will be sent to the elasticsearch cluster + * @param ttlMs + * Time to live expressed in milliseconds. Value <= 0 is ignored + * @throws Exception + */ + public void addEvent(Event event, IndexNameBuilder indexNameBuilder, + String indexType, long ttlMs) throws Exception; + + /** + * Sends bulk to the elasticsearch cluster + * + * @throws Exception + */ + void execute() throws Exception; +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClientFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClientFactory.java new file mode 100644 index 0000000..986fb2b --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchClientFactory.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.apache.flume.sink.elasticsearch.ElasticSearchIndexRequestBuilderFactory; + +/** + * Internal ElasticSearch client factory. Responsible for creating instance + * of ElasticSearch clients. + */ +public class ElasticSearchClientFactory { + public static final String TransportClient = "transport"; + public static final String RestClient = "rest"; + + /** + * + * @param clientType + * String representation of client type + * @param hostNames + * Array of strings that represents hostnames with ports (hostname:port) + * @param clusterName + * Elasticsearch cluster name used only by Transport Client + * @param serializer + * Serializer of flume events to elasticsearch documents + * @return + */ + public ElasticSearchClient getClient(String clientType, String[] hostNames, + String clusterName, ElasticSearchEventSerializer serializer, + ElasticSearchIndexRequestBuilderFactory indexBuilder) throws NoSuchClientTypeException { + if (clientType.equalsIgnoreCase(TransportClient) && serializer != null) { + return new ElasticSearchTransportClient(hostNames, clusterName, serializer); + } else if (clientType.equalsIgnoreCase(TransportClient) && indexBuilder != null) { + return new ElasticSearchTransportClient(hostNames, clusterName, indexBuilder); + } else if (clientType.equalsIgnoreCase(RestClient) && serializer != null) { + return new ElasticSearchRestClient(hostNames, serializer); + } + throw new NoSuchClientTypeException(); + } + + /** + * Used for tests only. Creates local elasticsearch instance client. + * + * @param clientType Name of client to use + * @param serializer Serializer for the event + * @param indexBuilder Index builder factory + * + * @return Local elastic search instance client + */ + public ElasticSearchClient getLocalClient(String clientType, + ElasticSearchEventSerializer serializer, + ElasticSearchIndexRequestBuilderFactory indexBuilder) + throws NoSuchClientTypeException { + if (clientType.equalsIgnoreCase(TransportClient) && serializer != null) { + return new ElasticSearchTransportClient(serializer); + } else if (clientType.equalsIgnoreCase(TransportClient) && indexBuilder != null) { + return new ElasticSearchTransportClient(indexBuilder); + } else if (clientType.equalsIgnoreCase(RestClient)) { + } + throw new NoSuchClientTypeException(); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchRestClient.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchRestClient.java new file mode 100644 index 0000000..e51efe2 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchRestClient.java @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import com.google.common.annotations.VisibleForTesting; +import com.google.gson.Gson; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.apache.flume.sink.elasticsearch.IndexNameBuilder; +import org.apache.http.HttpResponse; +import org.apache.http.HttpStatus; +import org.apache.http.client.HttpClient; +import org.apache.http.client.methods.HttpPost; +import org.apache.http.entity.StringEntity; +import org.apache.http.impl.client.DefaultHttpClient; +import org.apache.http.util.EntityUtils; +import org.elasticsearch.common.bytes.BytesReference; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.Map; + +/** + * Rest ElasticSearch client which is responsible for sending bulks of events to + * ElasticSearch using ElasticSearch HTTP API. This is configurable, so any + * config params required should be taken through this. + */ +public class ElasticSearchRestClient implements ElasticSearchClient { + + private static final String INDEX_OPERATION_NAME = "index"; + private static final String INDEX_PARAM = "_index"; + private static final String TYPE_PARAM = "_type"; + private static final String TTL_PARAM = "_ttl"; + private static final String BULK_ENDPOINT = "_bulk"; + + private static final Logger logger = LoggerFactory.getLogger(ElasticSearchRestClient.class); + + private final ElasticSearchEventSerializer serializer; + private final RoundRobinList serversList; + + private StringBuilder bulkBuilder; + private HttpClient httpClient; + + public ElasticSearchRestClient(String[] hostNames, + ElasticSearchEventSerializer serializer) { + + for (int i = 0; i < hostNames.length; ++i) { + if (!hostNames[i].contains("http://") && !hostNames[i].contains("https://")) { + hostNames[i] = "http://" + hostNames[i]; + } + } + this.serializer = serializer; + + serversList = new RoundRobinList(Arrays.asList(hostNames)); + httpClient = new DefaultHttpClient(); + bulkBuilder = new StringBuilder(); + } + + @VisibleForTesting + public ElasticSearchRestClient(String[] hostNames, + ElasticSearchEventSerializer serializer, HttpClient client) { + this(hostNames, serializer); + httpClient = client; + } + + @Override + public void configure(Context context) { + } + + @Override + public void close() { + } + + @Override + public void addEvent(Event event, IndexNameBuilder indexNameBuilder, String indexType, + long ttlMs) throws Exception { + BytesReference content = serializer.getContentBuilder(event).bytes(); + Map> parameters = new HashMap>(); + Map indexParameters = new HashMap(); + indexParameters.put(INDEX_PARAM, indexNameBuilder.getIndexName(event)); + indexParameters.put(TYPE_PARAM, indexType); + if (ttlMs > 0) { + indexParameters.put(TTL_PARAM, Long.toString(ttlMs)); + } + parameters.put(INDEX_OPERATION_NAME, indexParameters); + + Gson gson = new Gson(); + synchronized (bulkBuilder) { + bulkBuilder.append(gson.toJson(parameters)); + bulkBuilder.append("\n"); + bulkBuilder.append(content.toBytesArray().toUtf8()); + bulkBuilder.append("\n"); + } + } + + @Override + public void execute() throws Exception { + int statusCode = 0, triesCount = 0; + HttpResponse response = null; + String entity; + synchronized (bulkBuilder) { + entity = bulkBuilder.toString(); + bulkBuilder = new StringBuilder(); + } + + while (statusCode != HttpStatus.SC_OK && triesCount < serversList.size()) { + triesCount++; + String host = serversList.get(); + String url = host + "/" + BULK_ENDPOINT; + HttpPost httpRequest = new HttpPost(url); + httpRequest.setEntity(new StringEntity(entity)); + response = httpClient.execute(httpRequest); + statusCode = response.getStatusLine().getStatusCode(); + logger.info("Status code from elasticsearch: " + statusCode); + if (response.getEntity() != null) { + logger.debug("Status message from elasticsearch: " + + EntityUtils.toString(response.getEntity(), "UTF-8")); + } + } + + if (statusCode != HttpStatus.SC_OK) { + if (response.getEntity() != null) { + throw new EventDeliveryException(EntityUtils.toString(response.getEntity(), "UTF-8")); + } else { + throw new EventDeliveryException("Elasticsearch status code was: " + statusCode); + } + } + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchTransportClient.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchTransportClient.java new file mode 100644 index 0000000..2cf365e --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchTransportClient.java @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import com.google.common.annotations.VisibleForTesting; +import java.io.IOException; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.apache.flume.sink.elasticsearch.IndexNameBuilder; +import org.elasticsearch.action.bulk.BulkRequestBuilder; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Client; +import org.elasticsearch.client.transport.TransportClient; +import org.elasticsearch.common.settings.ImmutableSettings; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.common.transport.InetSocketTransportAddress; +import org.elasticsearch.node.Node; +import org.elasticsearch.node.NodeBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Arrays; +import org.apache.flume.sink.elasticsearch.ElasticSearchIndexRequestBuilderFactory; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.DEFAULT_PORT; + +public class ElasticSearchTransportClient implements ElasticSearchClient { + + public static final Logger logger = LoggerFactory + .getLogger(ElasticSearchTransportClient.class); + + private InetSocketTransportAddress[] serverAddresses; + private ElasticSearchEventSerializer serializer; + private ElasticSearchIndexRequestBuilderFactory indexRequestBuilderFactory; + private BulkRequestBuilder bulkRequestBuilder; + + private Client client; + + @VisibleForTesting + InetSocketTransportAddress[] getServerAddresses() { + return serverAddresses; + } + + @VisibleForTesting + void setBulkRequestBuilder(BulkRequestBuilder bulkRequestBuilder) { + this.bulkRequestBuilder = bulkRequestBuilder; + } + + /** + * Transport client for external cluster + * + * @param hostNames + * @param clusterName + * @param serializer + */ + public ElasticSearchTransportClient(String[] hostNames, String clusterName, + ElasticSearchEventSerializer serializer) { + configureHostnames(hostNames); + this.serializer = serializer; + openClient(clusterName); + } + + public ElasticSearchTransportClient(String[] hostNames, String clusterName, + ElasticSearchIndexRequestBuilderFactory indexBuilder) { + configureHostnames(hostNames); + this.indexRequestBuilderFactory = indexBuilder; + openClient(clusterName); + } + + /** + * Local transport client only for testing + * + * @param indexBuilderFactory + */ + public ElasticSearchTransportClient(ElasticSearchIndexRequestBuilderFactory indexBuilderFactory) { + this.indexRequestBuilderFactory = indexBuilderFactory; + openLocalDiscoveryClient(); + } + + /** + * Local transport client only for testing + * + * @param serializer + */ + public ElasticSearchTransportClient(ElasticSearchEventSerializer serializer) { + this.serializer = serializer; + openLocalDiscoveryClient(); + } + + /** + * Used for testing + * + * @param client + * ElasticSearch Client + * @param serializer + * Event Serializer + */ + public ElasticSearchTransportClient(Client client, + ElasticSearchEventSerializer serializer) { + this.client = client; + this.serializer = serializer; + } + + /** + * Used for testing + */ + public ElasticSearchTransportClient(Client client, + ElasticSearchIndexRequestBuilderFactory requestBuilderFactory) + throws IOException { + this.client = client; + requestBuilderFactory.createIndexRequest(client, null, null, null); + } + + private void configureHostnames(String[] hostNames) { + logger.warn(Arrays.toString(hostNames)); + serverAddresses = new InetSocketTransportAddress[hostNames.length]; + for (int i = 0; i < hostNames.length; i++) { + String[] hostPort = hostNames[i].trim().split(":"); + String host = hostPort[0].trim(); + int port = hostPort.length == 2 ? Integer.parseInt(hostPort[1].trim()) + : DEFAULT_PORT; + serverAddresses[i] = new InetSocketTransportAddress(host, port); + } + } + + @Override + public void close() { + if (client != null) { + client.close(); + } + client = null; + } + + @Override + public void addEvent(Event event, IndexNameBuilder indexNameBuilder, + String indexType, long ttlMs) throws Exception { + if (bulkRequestBuilder == null) { + bulkRequestBuilder = client.prepareBulk(); + } + + IndexRequestBuilder indexRequestBuilder = null; + if (indexRequestBuilderFactory == null) { + indexRequestBuilder = client + .prepareIndex(indexNameBuilder.getIndexName(event), indexType) + .setSource(serializer.getContentBuilder(event).bytes()); + } else { + indexRequestBuilder = indexRequestBuilderFactory.createIndexRequest( + client, indexNameBuilder.getIndexPrefix(event), indexType, event); + } + + if (ttlMs > 0) { + indexRequestBuilder.setTTL(ttlMs); + } + bulkRequestBuilder.add(indexRequestBuilder); + } + + @Override + public void execute() throws Exception { + try { + BulkResponse bulkResponse = bulkRequestBuilder.execute().actionGet(); + if (bulkResponse.hasFailures()) { + throw new EventDeliveryException(bulkResponse.buildFailureMessage()); + } + } finally { + bulkRequestBuilder = client.prepareBulk(); + } + } + + /** + * Open client to elaticsearch cluster + * + * @param clusterName + */ + private void openClient(String clusterName) { + logger.info("Using ElasticSearch hostnames: {} ", + Arrays.toString(serverAddresses)); + Settings settings = ImmutableSettings.settingsBuilder() + .put("cluster.name", clusterName).build(); + + TransportClient transportClient = new TransportClient(settings); + for (InetSocketTransportAddress host : serverAddresses) { + transportClient.addTransportAddress(host); + } + if (client != null) { + client.close(); + } + client = transportClient; + } + + /* + * FOR TESTING ONLY... + * + * Opens a local discovery node for talking to an elasticsearch server running + * in the same JVM + */ + private void openLocalDiscoveryClient() { + logger.info("Using ElasticSearch AutoDiscovery mode"); + Node node = NodeBuilder.nodeBuilder().client(true).local(true).node(); + if (client != null) { + client.close(); + } + client = node.client(); + } + + @Override + public void configure(Context context) { + //To change body of implemented methods use File | Settings | File Templates. + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/NoSuchClientTypeException.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/NoSuchClientTypeException.java new file mode 100644 index 0000000..41fbe0d --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/NoSuchClientTypeException.java @@ -0,0 +1,23 @@ +/* + * Copyright 2014 Apache Software Foundation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.elasticsearch.client; + +/** + * Exception class + */ +class NoSuchClientTypeException extends Exception { +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/RoundRobinList.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/RoundRobinList.java new file mode 100644 index 0000000..4cbbe91 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/RoundRobinList.java @@ -0,0 +1,44 @@ +package org.apache.flume.sink.elasticsearch.client; + +import java.util.Collection; +import java.util.Iterator; + +/* + * Copyright 2014 Apache Software Foundation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +public class RoundRobinList { + + private Iterator iterator; + private final Collection elements; + + public RoundRobinList(Collection elements) { + this.elements = elements; + iterator = this.elements.iterator(); + } + + public synchronized T get() { + if (iterator.hasNext()) { + return iterator.next(); + } else { + iterator = elements.iterator(); + return iterator.next(); + } + } + + public int size() { + return elements.size(); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchSinkTest.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchSinkTest.java new file mode 100644 index 0000000..9fbd747 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchSinkTest.java @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.channel.MemoryChannel; +import org.apache.flume.conf.Configurables; +import org.elasticsearch.action.search.SearchResponse; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.collect.Maps; +import org.elasticsearch.common.settings.ImmutableSettings; +import org.elasticsearch.common.settings.Settings; +import org.elasticsearch.gateway.Gateway; +import org.elasticsearch.index.query.QueryBuilder; +import org.elasticsearch.index.query.QueryBuilders; +import org.elasticsearch.node.Node; +import org.elasticsearch.node.NodeBuilder; +import org.elasticsearch.node.internal.InternalNode; +import org.elasticsearch.search.SearchHit; +import org.elasticsearch.search.SearchHits; +import org.joda.time.DateTimeUtils; +import org.junit.After; +import org.junit.Before; + +import java.util.Arrays; +import java.util.Comparator; +import java.util.Map; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.BATCH_SIZE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.CLUSTER_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.TTL; +import static org.junit.Assert.assertEquals; + +public abstract class AbstractElasticSearchSinkTest { + + static final String DEFAULT_INDEX_NAME = "flume"; + static final String DEFAULT_INDEX_TYPE = "log"; + static final String DEFAULT_CLUSTER_NAME = "elasticsearch"; + static final long FIXED_TIME_MILLIS = 123456789L; + + Node node; + Client client; + String timestampedIndexName; + Map parameters; + + void initDefaults() { + parameters = Maps.newHashMap(); + parameters.put(INDEX_NAME, DEFAULT_INDEX_NAME); + parameters.put(INDEX_TYPE, DEFAULT_INDEX_TYPE); + parameters.put(CLUSTER_NAME, DEFAULT_CLUSTER_NAME); + parameters.put(BATCH_SIZE, "1"); + parameters.put(TTL, "5"); + + timestampedIndexName = DEFAULT_INDEX_NAME + '-' + + ElasticSearchIndexRequestBuilderFactory.df.format(FIXED_TIME_MILLIS); + } + + void createNodes() throws Exception { + Settings settings = ImmutableSettings + .settingsBuilder() + .put("number_of_shards", 1) + .put("number_of_replicas", 0) + .put("routing.hash.type", "simple") + .put("gateway.type", "none") + .put("path.data", "target/es-test") + .build(); + + node = NodeBuilder.nodeBuilder().settings(settings).local(true).node(); + client = node.client(); + + client.admin().cluster().prepareHealth().setWaitForGreenStatus().execute() + .actionGet(); + } + + void shutdownNodes() throws Exception { + ((InternalNode) node).injector().getInstance(Gateway.class).reset(); + client.close(); + node.close(); + } + + @Before + public void setFixedJodaTime() { + DateTimeUtils.setCurrentMillisFixed(FIXED_TIME_MILLIS); + } + + @After + public void resetJodaTime() { + DateTimeUtils.setCurrentMillisSystem(); + } + + Channel bindAndStartChannel(ElasticSearchSink fixture) { + // Configure the channel + Channel channel = new MemoryChannel(); + Configurables.configure(channel, new Context()); + + // Wire them together + fixture.setChannel(channel); + fixture.start(); + return channel; + } + + void assertMatchAllQuery(int expectedHits, Event... events) { + assertSearch(expectedHits, performSearch(QueryBuilders.matchAllQuery()), + null, events); + } + + void assertBodyQuery(int expectedHits, Event... events) { + // Perform Multi Field Match + assertSearch(expectedHits, + performSearch(QueryBuilders.fieldQuery("@message", "event")), + null, events); + } + + SearchResponse performSearch(QueryBuilder query) { + return client.prepareSearch(timestampedIndexName) + .setTypes(DEFAULT_INDEX_TYPE).setQuery(query).execute().actionGet(); + } + + void assertSearch(int expectedHits, SearchResponse response, Map expectedBody, + Event... events) { + SearchHits hitResponse = response.getHits(); + assertEquals(expectedHits, hitResponse.getTotalHits()); + + SearchHit[] hits = hitResponse.getHits(); + Arrays.sort(hits, new Comparator() { + @Override + public int compare(SearchHit o1, SearchHit o2) { + return o1.getSourceAsString().compareTo(o2.getSourceAsString()); + } + }); + + for (int i = 0; i < events.length; i++) { + Event event = events[i]; + SearchHit hit = hits[i]; + Map source = hit.getSource(); + if (expectedBody == null) { + assertEquals(new String(event.getBody()), source.get("@message")); + } else { + assertEquals(expectedBody, source.get("@message")); + } + } + } + +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchDynamicSerializer.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchDynamicSerializer.java new file mode 100644 index 0000000..d4e4654 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchDynamicSerializer.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.EventBuilder; +import org.elasticsearch.common.collect.Maps; +import org.elasticsearch.common.xcontent.XContentBuilder; +import org.junit.Test; + +import java.util.Map; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer.charset; +import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; +import static org.junit.Assert.assertEquals; + +public class TestElasticSearchDynamicSerializer { + + @Test + public void testRoundTrip() throws Exception { + ElasticSearchDynamicSerializer fixture = new ElasticSearchDynamicSerializer(); + Context context = new Context(); + fixture.configure(context); + + String message = "test body"; + Map headers = Maps.newHashMap(); + headers.put("headerNameOne", "headerValueOne"); + headers.put("headerNameTwo", "headerValueTwo"); + headers.put("headerNameThree", "headerValueThree"); + Event event = EventBuilder.withBody(message.getBytes(charset)); + event.setHeaders(headers); + + XContentBuilder expected = jsonBuilder().startObject(); + expected.field("body", new String(message.getBytes(), charset)); + for (String headerName : headers.keySet()) { + expected.field(headerName, new String(headers.get(headerName).getBytes(), + charset)); + } + expected.endObject(); + + XContentBuilder actual = fixture.getContentBuilder(event); + + assertEquals(new String(expected.bytes().array()), new String(actual + .bytes().array())); + + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchIndexRequestBuilderFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchIndexRequestBuilderFactory.java new file mode 100644 index 0000000..b62254e --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchIndexRequestBuilderFactory.java @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import com.google.common.collect.Maps; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.conf.ComponentConfiguration; +import org.apache.flume.conf.sink.SinkConfiguration; +import org.apache.flume.event.SimpleEvent; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.io.BytesStream; +import org.elasticsearch.common.io.FastByteArrayOutputStream; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.util.Map; + +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; + +public class TestElasticSearchIndexRequestBuilderFactory + extends AbstractElasticSearchSinkTest { + + private static final Client FAKE_CLIENT = null; + + private EventSerializerIndexRequestBuilderFactory factory; + + private FakeEventSerializer serializer; + + @Before + public void setupFactory() throws Exception { + serializer = new FakeEventSerializer(); + factory = new EventSerializerIndexRequestBuilderFactory(serializer) { + @Override + IndexRequestBuilder prepareIndex(Client client) { + return new IndexRequestBuilder(FAKE_CLIENT); + } + }; + } + + @Test + public void shouldUseUtcAsBasisForDateFormat() { + assertEquals("Coordinated Universal Time", + factory.fastDateFormat.getTimeZone().getDisplayName()); + } + + @Test + public void indexNameShouldBePrefixDashFormattedTimestamp() { + long millis = 987654321L; + assertEquals("prefix-" + factory.fastDateFormat.format(millis), + factory.getIndexName("prefix", millis)); + } + + @Test + public void shouldEnsureTimestampHeaderPresentInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(FIXED_TIME_MILLIS, timestampedEvent.getTimestamp()); + assertEquals(String.valueOf(FIXED_TIME_MILLIS), + timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldUseExistingTimestampHeaderInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("timestamp", "-321"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(-321L, timestampedEvent.getTimestamp()); + assertEquals("-321", timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldUseExistingAtTimestampHeaderInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("@timestamp", "-999"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(-999L, timestampedEvent.getTimestamp()); + assertEquals("-999", timestampedEvent.getHeaders().get("@timestamp")); + assertNull(timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldPreserveBodyAndNonTimestampHeadersInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + base.setBody(new byte[] {1,2,3,4}); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("foo", "bar"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals("bar", timestampedEvent.getHeaders().get("foo")); + assertArrayEquals(base.getBody(), timestampedEvent.getBody()); + } + + @Test + public void shouldSetIndexNameTypeAndSerializedEventIntoIndexRequest() + throws Exception { + + String indexPrefix = "qwerty"; + String indexType = "uiop"; + Event event = new SimpleEvent(); + + IndexRequestBuilder indexRequestBuilder = factory.createIndexRequest( + FAKE_CLIENT, indexPrefix, indexType, event); + + assertEquals(indexPrefix + '-' + + ElasticSearchIndexRequestBuilderFactory.df.format(FIXED_TIME_MILLIS), + indexRequestBuilder.request().index()); + assertEquals(indexType, indexRequestBuilder.request().type()); + assertArrayEquals(FakeEventSerializer.FAKE_BYTES, + indexRequestBuilder.request().source().array()); + } + + @Test + public void shouldSetIndexNameFromTimestampHeaderWhenPresent() + throws Exception { + String indexPrefix = "qwerty"; + String indexType = "uiop"; + Event event = new SimpleEvent(); + event.getHeaders().put("timestamp", "1213141516"); + + IndexRequestBuilder indexRequestBuilder = factory.createIndexRequest( + null, indexPrefix, indexType, event); + + assertEquals(indexPrefix + '-' + + ElasticSearchIndexRequestBuilderFactory.df.format(1213141516L), + indexRequestBuilder.request().index()); + } + + @Test + public void shouldSetIndexNameTypeFromHeaderWhenPresent() + throws Exception { + String indexPrefix = "%{index-name}"; + String indexType = "%{index-type}"; + String indexValue = "testing-index-name-from-headers"; + String typeValue = "testing-index-type-from-headers"; + + Event event = new SimpleEvent(); + event.getHeaders().put("index-name", indexValue); + event.getHeaders().put("index-type", typeValue); + + IndexRequestBuilder indexRequestBuilder = factory.createIndexRequest( + null, indexPrefix, indexType, event); + + assertEquals(indexValue + '-' + + ElasticSearchIndexRequestBuilderFactory.df.format(FIXED_TIME_MILLIS), + indexRequestBuilder.request().index()); + assertEquals(typeValue, indexRequestBuilder.request().type()); + } + + @Test + public void shouldConfigureEventSerializer() throws Exception { + assertFalse(serializer.configuredWithContext); + factory.configure(new Context()); + assertTrue(serializer.configuredWithContext); + + assertFalse(serializer.configuredWithComponentConfiguration); + factory.configure(new SinkConfiguration("name")); + assertTrue(serializer.configuredWithComponentConfiguration); + } + + static class FakeEventSerializer implements ElasticSearchEventSerializer { + + static final byte[] FAKE_BYTES = new byte[]{9, 8, 7, 6}; + boolean configuredWithContext; + boolean configuredWithComponentConfiguration; + + @Override + public BytesStream getContentBuilder(Event event) throws IOException { + FastByteArrayOutputStream fbaos = new FastByteArrayOutputStream(4); + fbaos.write(FAKE_BYTES); + return fbaos; + } + + @Override + public void configure(Context arg0) { + configuredWithContext = true; + } + + @Override + public void configure(ComponentConfiguration arg0) { + configuredWithComponentConfiguration = true; + } + } + +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchLogStashEventSerializer.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchLogStashEventSerializer.java new file mode 100644 index 0000000..65b4dab --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchLogStashEventSerializer.java @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import com.google.gson.JsonParser; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.EventBuilder; +import org.elasticsearch.common.collect.Maps; +import org.elasticsearch.common.xcontent.XContentBuilder; +import org.junit.Test; + +import java.util.Date; +import java.util.Map; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer.charset; +import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; +import static org.junit.Assert.assertEquals; + +public class TestElasticSearchLogStashEventSerializer { + + @Test + public void testRoundTrip() throws Exception { + ElasticSearchLogStashEventSerializer fixture = new ElasticSearchLogStashEventSerializer(); + Context context = new Context(); + fixture.configure(context); + + String message = "test body"; + Map headers = Maps.newHashMap(); + long timestamp = System.currentTimeMillis(); + headers.put("timestamp", String.valueOf(timestamp)); + headers.put("source", "flume_tail_src"); + headers.put("host", "test@localhost"); + headers.put("src_path", "/tmp/test"); + headers.put("headerNameOne", "headerValueOne"); + headers.put("headerNameTwo", "headerValueTwo"); + headers.put("type", "sometype"); + Event event = EventBuilder.withBody(message.getBytes(charset)); + event.setHeaders(headers); + + XContentBuilder expected = jsonBuilder().startObject(); + expected.field("@message", new String(message.getBytes(), charset)); + expected.field("@timestamp", new Date(timestamp)); + expected.field("@source", "flume_tail_src"); + expected.field("@type", "sometype"); + expected.field("@source_host", "test@localhost"); + expected.field("@source_path", "/tmp/test"); + + expected.startObject("@fields"); + expected.field("timestamp", String.valueOf(timestamp)); + expected.field("src_path", "/tmp/test"); + expected.field("host", "test@localhost"); + expected.field("headerNameTwo", "headerValueTwo"); + expected.field("source", "flume_tail_src"); + expected.field("headerNameOne", "headerValueOne"); + expected.field("type", "sometype"); + expected.endObject(); + + expected.endObject(); + + XContentBuilder actual = fixture.getContentBuilder(event); + + JsonParser parser = new JsonParser(); + assertEquals(parser.parse(expected.string()),parser.parse(actual.string())); + } + + @Test + public void shouldHandleInvalidJSONDuringComplexParsing() throws Exception { + ElasticSearchLogStashEventSerializer fixture = new ElasticSearchLogStashEventSerializer(); + Context context = new Context(); + fixture.configure(context); + + String message = "{flume: somethingnotvalid}"; + Map headers = Maps.newHashMap(); + long timestamp = System.currentTimeMillis(); + headers.put("timestamp", String.valueOf(timestamp)); + headers.put("source", "flume_tail_src"); + headers.put("host", "test@localhost"); + headers.put("src_path", "/tmp/test"); + headers.put("headerNameOne", "headerValueOne"); + headers.put("headerNameTwo", "headerValueTwo"); + headers.put("type", "sometype"); + Event event = EventBuilder.withBody(message.getBytes(charset)); + event.setHeaders(headers); + + XContentBuilder expected = jsonBuilder().startObject(); + expected.field("@message", new String(message.getBytes(), charset)); + expected.field("@timestamp", new Date(timestamp)); + expected.field("@source", "flume_tail_src"); + expected.field("@type", "sometype"); + expected.field("@source_host", "test@localhost"); + expected.field("@source_path", "/tmp/test"); + + expected.startObject("@fields"); + expected.field("timestamp", String.valueOf(timestamp)); + expected.field("src_path", "/tmp/test"); + expected.field("host", "test@localhost"); + expected.field("headerNameTwo", "headerValueTwo"); + expected.field("source", "flume_tail_src"); + expected.field("headerNameOne", "headerValueOne"); + expected.field("type", "sometype"); + expected.endObject(); + + expected.endObject(); + + XContentBuilder actual = fixture.getContentBuilder(event); + + JsonParser parser = new JsonParser(); + assertEquals(parser.parse(expected.string()),parser.parse(actual.string())); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSink.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSink.java new file mode 100644 index 0000000..69acc06 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSink.java @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.commons.lang.time.FastDateFormat; +import org.apache.flume.Channel; +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.Sink.Status; +import org.apache.flume.Transaction; +import org.apache.flume.conf.ComponentConfiguration; +import org.apache.flume.conf.Configurable; +import org.apache.flume.conf.Configurables; +import org.apache.flume.event.EventBuilder; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Requests; +import org.elasticsearch.common.UUID; +import org.elasticsearch.common.io.BytesStream; +import org.elasticsearch.common.io.FastByteArrayOutputStream; +import org.elasticsearch.index.query.QueryBuilders; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; +import java.util.TimeZone; +import java.util.concurrent.TimeUnit; + +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.BATCH_SIZE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.CLUSTER_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.HOSTNAMES; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_NAME; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.INDEX_TYPE; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.SERIALIZER; +import static org.apache.flume.sink.elasticsearch.ElasticSearchSinkConstants.TTL; +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; + +public class TestElasticSearchSink extends AbstractElasticSearchSinkTest { + + private ElasticSearchSink fixture; + + @Before + public void init() throws Exception { + initDefaults(); + createNodes(); + fixture = new ElasticSearchSink(true); + fixture.setName("ElasticSearchSink-" + UUID.randomUUID().toString()); + } + + @After + public void tearDown() throws Exception { + shutdownNodes(); + } + + @Test + public void shouldIndexOneEvent() throws Exception { + Configurables.configure(fixture, new Context(parameters)); + Channel channel = bindAndStartChannel(fixture); + + Transaction tx = channel.getTransaction(); + tx.begin(); + Event event = EventBuilder.withBody("event #1 or 1".getBytes()); + channel.put(event); + tx.commit(); + tx.close(); + + fixture.process(); + fixture.stop(); + client.admin().indices() + .refresh(Requests.refreshRequest(timestampedIndexName)).actionGet(); + + assertMatchAllQuery(1, event); + assertBodyQuery(1, event); + } + + @Test + public void shouldIndexInvalidComplexJsonBody() throws Exception { + parameters.put(BATCH_SIZE, "3"); + Configurables.configure(fixture, new Context(parameters)); + Channel channel = bindAndStartChannel(fixture); + + Transaction tx = channel.getTransaction(); + tx.begin(); + Event event1 = EventBuilder.withBody("TEST1 {test}".getBytes()); + channel.put(event1); + Event event2 = EventBuilder.withBody("{test: TEST2 }".getBytes()); + channel.put(event2); + Event event3 = EventBuilder.withBody("{\"test\":{ TEST3 {test} }}".getBytes()); + channel.put(event3); + tx.commit(); + tx.close(); + + fixture.process(); + fixture.stop(); + client.admin().indices() + .refresh(Requests.refreshRequest(timestampedIndexName)).actionGet(); + + assertMatchAllQuery(3); + assertSearch(1, + performSearch(QueryBuilders.fieldQuery("@message", "TEST1")), + null, event1); + assertSearch(1, + performSearch(QueryBuilders.fieldQuery("@message", "TEST2")), + null, event2); + assertSearch(1, + performSearch(QueryBuilders.fieldQuery("@message", "TEST3")), + null, event3); + } + + @Test + public void shouldIndexComplexJsonEvent() throws Exception { + Configurables.configure(fixture, new Context(parameters)); + Channel channel = bindAndStartChannel(fixture); + + Transaction tx = channel.getTransaction(); + tx.begin(); + Event event = EventBuilder.withBody( + "{\"event\":\"json content\",\"num\":1}".getBytes()); + channel.put(event); + tx.commit(); + tx.close(); + + fixture.process(); + fixture.stop(); + client.admin().indices() + .refresh(Requests.refreshRequest(timestampedIndexName)).actionGet(); + + Map expectedBody = new HashMap(); + expectedBody.put("event", "json content"); + expectedBody.put("num", 1); + + assertSearch(1, + performSearch(QueryBuilders.matchAllQuery()), expectedBody, event); + assertSearch(1, + performSearch(QueryBuilders.fieldQuery("@message.event", "json")), + expectedBody, event); + } + + @Test + public void shouldIndexFiveEvents() throws Exception { + // Make it so we only need to call process once + parameters.put(BATCH_SIZE, "5"); + Configurables.configure(fixture, new Context(parameters)); + Channel channel = bindAndStartChannel(fixture); + + int numberOfEvents = 5; + Event[] events = new Event[numberOfEvents]; + + Transaction tx = channel.getTransaction(); + tx.begin(); + for (int i = 0; i < numberOfEvents; i++) { + String body = "event #" + i + " of " + numberOfEvents; + Event event = EventBuilder.withBody(body.getBytes()); + events[i] = event; + channel.put(event); + } + tx.commit(); + tx.close(); + + fixture.process(); + fixture.stop(); + client.admin().indices() + .refresh(Requests.refreshRequest(timestampedIndexName)).actionGet(); + + assertMatchAllQuery(numberOfEvents, events); + assertBodyQuery(5, events); + } + + @Test + public void shouldIndexFiveEventsOverThreeBatches() throws Exception { + parameters.put(BATCH_SIZE, "2"); + Configurables.configure(fixture, new Context(parameters)); + Channel channel = bindAndStartChannel(fixture); + + int numberOfEvents = 5; + Event[] events = new Event[numberOfEvents]; + + Transaction tx = channel.getTransaction(); + tx.begin(); + for (int i = 0; i < numberOfEvents; i++) { + String body = "event #" + i + " of " + numberOfEvents; + Event event = EventBuilder.withBody(body.getBytes()); + events[i] = event; + channel.put(event); + } + tx.commit(); + tx.close(); + + int count = 0; + Status status = Status.READY; + while (status != Status.BACKOFF) { + count++; + status = fixture.process(); + } + fixture.stop(); + + assertEquals(3, count); + + client.admin().indices() + .refresh(Requests.refreshRequest(timestampedIndexName)).actionGet(); + assertMatchAllQuery(numberOfEvents, events); + assertBodyQuery(5, events); + } + + @Test + public void shouldParseConfiguration() { + parameters.put(HOSTNAMES, "10.5.5.27"); + parameters.put(CLUSTER_NAME, "testing-cluster-name"); + parameters.put(INDEX_NAME, "testing-index-name"); + parameters.put(INDEX_TYPE, "testing-index-type"); + parameters.put(TTL, "10"); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27" }; + + assertEquals("testing-cluster-name", fixture.getClusterName()); + assertEquals("testing-index-name", fixture.getIndexName()); + assertEquals("testing-index-type", fixture.getIndexType()); + assertEquals(TimeUnit.DAYS.toMillis(10), fixture.getTTLMs()); + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldParseConfigurationUsingDefaults() { + parameters.put(HOSTNAMES, "10.5.5.27"); + parameters.remove(INDEX_NAME); + parameters.remove(INDEX_TYPE); + parameters.remove(CLUSTER_NAME); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27" }; + + assertEquals(DEFAULT_INDEX_NAME, fixture.getIndexName()); + assertEquals(DEFAULT_INDEX_TYPE, fixture.getIndexType()); + assertEquals(DEFAULT_CLUSTER_NAME, fixture.getClusterName()); + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldParseMultipleHostUsingDefaultPorts() { + parameters.put(HOSTNAMES, "10.5.5.27,10.5.5.28,10.5.5.29"); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27", "10.5.5.28", "10.5.5.29" }; + + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldParseMultipleHostWithWhitespacesUsingDefaultPorts() { + parameters.put(HOSTNAMES, " 10.5.5.27 , 10.5.5.28 , 10.5.5.29 "); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27", "10.5.5.28", "10.5.5.29" }; + + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldParseMultipleHostAndPorts() { + parameters.put(HOSTNAMES, "10.5.5.27:9300,10.5.5.28:9301,10.5.5.29:9302"); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27:9300", "10.5.5.28:9301", "10.5.5.29:9302" }; + + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldParseMultipleHostAndPortsWithWhitespaces() { + parameters.put(HOSTNAMES, + " 10.5.5.27 : 9300 , 10.5.5.28 : 9301 , 10.5.5.29 : 9302 "); + + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27:9300", "10.5.5.28:9301", "10.5.5.29:9302" }; + + assertArrayEquals(expected, fixture.getServerAddresses()); + } + + @Test + public void shouldAllowCustomElasticSearchIndexRequestBuilderFactory() + throws Exception { + parameters.put(SERIALIZER, + CustomElasticSearchIndexRequestBuilderFactory.class.getName()); + + fixture.configure(new Context(parameters)); + + Channel channel = bindAndStartChannel(fixture); + Transaction tx = channel.getTransaction(); + tx.begin(); + String body = "{ foo: \"bar\" }"; + Event event = EventBuilder.withBody(body.getBytes()); + channel.put(event); + tx.commit(); + tx.close(); + + fixture.process(); + fixture.stop(); + + assertEquals(fixture.getIndexName() + "-05_17_36_789", + CustomElasticSearchIndexRequestBuilderFactory.actualIndexName); + assertEquals(fixture.getIndexType(), + CustomElasticSearchIndexRequestBuilderFactory.actualIndexType); + assertArrayEquals(event.getBody(), + CustomElasticSearchIndexRequestBuilderFactory.actualEventBody); + assertTrue(CustomElasticSearchIndexRequestBuilderFactory.hasContext); + } + + @Test + public void shouldParseFullyQualifiedTTLs() { + Map testTTLMap = new HashMap(); + testTTLMap.put("1ms", Long.valueOf(1)); + testTTLMap.put("1s", Long.valueOf(1000)); + testTTLMap.put("1m", Long.valueOf(60000)); + testTTLMap.put("1h", Long.valueOf(3600000)); + testTTLMap.put("1d", Long.valueOf(86400000)); + testTTLMap.put("1w", Long.valueOf(604800000)); + testTTLMap.put("1", Long.valueOf(86400000)); + + parameters.put(HOSTNAMES, "10.5.5.27"); + parameters.put(CLUSTER_NAME, "testing-cluster-name"); + parameters.put(INDEX_NAME, "testing-index-name"); + parameters.put(INDEX_TYPE, "testing-index-type"); + + for (String ttl : testTTLMap.keySet()) { + parameters.put(TTL, ttl); + fixture = new ElasticSearchSink(); + fixture.configure(new Context(parameters)); + + String[] expected = { "10.5.5.27" }; + assertEquals("testing-cluster-name", fixture.getClusterName()); + assertEquals("testing-index-name", fixture.getIndexName()); + assertEquals("testing-index-type", fixture.getIndexType()); + assertEquals((long) testTTLMap.get(ttl), fixture.getTTLMs()); + assertArrayEquals(expected, fixture.getServerAddresses()); + + } + } + + public static final class CustomElasticSearchIndexRequestBuilderFactory + extends AbstractElasticSearchIndexRequestBuilderFactory { + + static String actualIndexName; + static String actualIndexType; + static byte[] actualEventBody; + static boolean hasContext; + + public CustomElasticSearchIndexRequestBuilderFactory() { + super(FastDateFormat.getInstance("HH_mm_ss_SSS", TimeZone.getTimeZone("EST5EDT"))); + } + + @Override + protected void prepareIndexRequest(IndexRequestBuilder indexRequest, String indexName, + String indexType, Event event) throws IOException { + actualIndexName = indexName; + actualIndexType = indexType; + actualEventBody = event.getBody(); + indexRequest.setIndex(indexName).setType(indexType).setSource(event.getBody()); + } + + @Override + public void configure(Context arg0) { + hasContext = true; + } + + @Override + public void configure(ComponentConfiguration arg0) { + //no-op + } + } + + @Test + public void shouldFailToConfigureWithInvalidSerializerClass() + throws Exception { + + parameters.put(SERIALIZER, "java.lang.String"); + try { + Configurables.configure(fixture, new Context(parameters)); + } catch (ClassCastException e) { + // expected + } + + parameters.put(SERIALIZER, FakeConfigurable.class.getName()); + try { + Configurables.configure(fixture, new Context(parameters)); + } catch (IllegalArgumentException e) { + // expected + } + } + + @Test + public void shouldUseSpecifiedSerializer() throws Exception { + Context context = new Context(); + context.put(SERIALIZER, + "org.apache.flume.sink.elasticsearch.FakeEventSerializer"); + + assertNull(fixture.getEventSerializer()); + fixture.configure(context); + assertTrue(fixture.getEventSerializer() instanceof FakeEventSerializer); + } + + @Test + public void shouldUseSpecifiedIndexNameBuilder() throws Exception { + Context context = new Context(); + context.put(ElasticSearchSinkConstants.INDEX_NAME_BUILDER, + "org.apache.flume.sink.elasticsearch.FakeIndexNameBuilder"); + + assertNull(fixture.getIndexNameBuilder()); + fixture.configure(context); + assertTrue(fixture.getIndexNameBuilder() instanceof FakeIndexNameBuilder); + } + + public static class FakeConfigurable implements Configurable { + @Override + public void configure(Context arg0) { + // no-op + } + } +} + +/** + * Internal class. Fake event serializer used for tests + */ +class FakeEventSerializer implements ElasticSearchEventSerializer { + + static final byte[] FAKE_BYTES = new byte[] { 9, 8, 7, 6 }; + boolean configuredWithContext; + boolean configuredWithComponentConfiguration; + + @Override + public BytesStream getContentBuilder(Event event) throws IOException { + FastByteArrayOutputStream fbaos = new FastByteArrayOutputStream(4); + fbaos.write(FAKE_BYTES); + return fbaos; + } + + @Override + public void configure(Context arg0) { + configuredWithContext = true; + } + + @Override + public void configure(ComponentConfiguration arg0) { + configuredWithComponentConfiguration = true; + } +} + +/** + * Internal class. Fake index name builder used only for tests. + */ +class FakeIndexNameBuilder implements IndexNameBuilder { + + static final String INDEX_NAME = "index_name"; + + @Override + public String getIndexName(Event event) { + return INDEX_NAME; + } + + @Override + public String getIndexPrefix(Event event) { + return INDEX_NAME; + } + + @Override + public void configure(Context context) { + } + + @Override + public void configure(ComponentConfiguration conf) { + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSinkCreation.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSinkCreation.java new file mode 100644 index 0000000..2a36439 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSinkCreation.java @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.FlumeException; +import org.apache.flume.Sink; +import org.apache.flume.SinkFactory; +import org.apache.flume.sink.DefaultSinkFactory; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +public class TestElasticSearchSinkCreation { + + private SinkFactory sinkFactory; + + @Before + public void setUp() { + sinkFactory = new DefaultSinkFactory(); + } + + private void verifySinkCreation(String name, String type, + Class typeClass) throws FlumeException { + Sink sink = sinkFactory.create(name, type); + Assert.assertNotNull(sink); + Assert.assertTrue(typeClass.isInstance(sink)); + } + + @Test + public void testSinkCreation() { + verifySinkCreation("elasticsearch-sink", "elasticsearch", ElasticSearchSink.class); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilderTest.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilderTest.java new file mode 100644 index 0000000..678342a --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilderTest.java @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import org.apache.flume.Context; +import org.apache.flume.Event; +import org.apache.flume.event.SimpleEvent; +import org.junit.Before; +import org.junit.Test; + +import java.util.HashMap; +import java.util.Map; + +import static org.junit.Assert.assertEquals; + +public class TimeBasedIndexNameBuilderTest { + + private TimeBasedIndexNameBuilder indexNameBuilder; + + @Before + public void setUp() throws Exception { + Context context = new Context(); + context.put(ElasticSearchSinkConstants.INDEX_NAME, "prefix"); + indexNameBuilder = new TimeBasedIndexNameBuilder(); + indexNameBuilder.configure(context); + } + + @Test + public void shouldUseUtcAsBasisForDateFormat() { + assertEquals("Coordinated Universal Time", + indexNameBuilder.getFastDateFormat().getTimeZone().getDisplayName()); + } + + @Test + public void indexNameShouldBePrefixDashFormattedTimestamp() { + long time = 987654321L; + Event event = new SimpleEvent(); + Map headers = new HashMap(); + headers.put("timestamp", Long.toString(time)); + event.setHeaders(headers); + assertEquals("prefix-" + indexNameBuilder.getFastDateFormat().format(time), + indexNameBuilder.getIndexName(event)); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimestampedEventTest.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimestampedEventTest.java new file mode 100644 index 0000000..bef2ac6 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TimestampedEventTest.java @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch; + +import com.google.common.collect.Maps; +import org.apache.flume.event.SimpleEvent; +import org.joda.time.DateTimeUtils; +import org.junit.Before; +import org.junit.Test; + +import java.util.Map; + +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNull; + +public class TimestampedEventTest { + static final long FIXED_TIME_MILLIS = 123456789L; + + @Before + public void setFixedJodaTime() { + DateTimeUtils.setCurrentMillisFixed(FIXED_TIME_MILLIS); + } + + @Test + public void shouldEnsureTimestampHeaderPresentInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(FIXED_TIME_MILLIS, timestampedEvent.getTimestamp()); + assertEquals(String.valueOf(FIXED_TIME_MILLIS), + timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldUseExistingTimestampHeaderInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("timestamp", "-321"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(-321L, timestampedEvent.getTimestamp()); + assertEquals("-321", timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldUseExistingAtTimestampHeaderInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("@timestamp", "-999"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals(-999L, timestampedEvent.getTimestamp()); + assertEquals("-999", timestampedEvent.getHeaders().get("@timestamp")); + assertNull(timestampedEvent.getHeaders().get("timestamp")); + } + + @Test + public void shouldPreserveBodyAndNonTimestampHeadersInTimestampedEvent() { + SimpleEvent base = new SimpleEvent(); + base.setBody(new byte[] {1,2,3,4}); + Map headersWithTimestamp = Maps.newHashMap(); + headersWithTimestamp.put("foo", "bar"); + base.setHeaders(headersWithTimestamp ); + + TimestampedEvent timestampedEvent = new TimestampedEvent(base); + assertEquals("bar", timestampedEvent.getHeaders().get("foo")); + assertArrayEquals(base.getBody(), timestampedEvent.getBody()); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/RoundRobinListTest.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/RoundRobinListTest.java new file mode 100644 index 0000000..0d1d092 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/RoundRobinListTest.java @@ -0,0 +1,42 @@ +/* + * Copyright 2014 Apache Software Foundation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flume.sink.elasticsearch.client; + +import java.util.Arrays; +import org.junit.Before; +import org.junit.Test; + +import static org.junit.Assert.assertEquals; + +public class RoundRobinListTest { + + private RoundRobinList fixture; + + @Before + public void setUp() { + fixture = new RoundRobinList(Arrays.asList("test1", "test2")); + } + + @Test + public void shouldReturnNextElement() { + assertEquals("test1", fixture.get()); + assertEquals("test2", fixture.get()); + assertEquals("test1", fixture.get()); + assertEquals("test2", fixture.get()); + assertEquals("test1", fixture.get()); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchClientFactory.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchClientFactory.java new file mode 100644 index 0000000..c3f07b0 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchClientFactory.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.junit.Before; +import org.junit.Test; +import org.mockito.Mock; + +import static org.hamcrest.core.IsInstanceOf.instanceOf; +import static org.junit.Assert.assertThat; +import static org.mockito.MockitoAnnotations.initMocks; + +public class TestElasticSearchClientFactory { + + ElasticSearchClientFactory factory; + + @Mock + ElasticSearchEventSerializer serializer; + + @Before + public void setUp() { + initMocks(this); + factory = new ElasticSearchClientFactory(); + } + + @Test + public void shouldReturnTransportClient() throws Exception { + String[] hostNames = { "127.0.0.1" }; + Object o = factory.getClient(ElasticSearchClientFactory.TransportClient, + hostNames, "test", serializer, null); + assertThat(o, instanceOf(ElasticSearchTransportClient.class)); + } + + @Test + public void shouldReturnRestClient() throws NoSuchClientTypeException { + String[] hostNames = { "127.0.0.1" }; + Object o = factory.getClient(ElasticSearchClientFactory.RestClient, + hostNames, "test", serializer, null); + assertThat(o, instanceOf(ElasticSearchRestClient.class)); + } + + @Test(expected = NoSuchClientTypeException.class) + public void shouldThrowNoSuchClientTypeException() throws NoSuchClientTypeException { + String[] hostNames = { "127.0.0.1" }; + factory.getClient("not_existing_client", hostNames, "test", null, null); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchRestClient.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchRestClient.java new file mode 100644 index 0000000..9551c81 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchRestClient.java @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import com.google.common.base.Splitter; +import com.google.gson.JsonObject; +import com.google.gson.JsonParser; +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.apache.flume.sink.elasticsearch.IndexNameBuilder; +import org.apache.http.HttpEntity; +import org.apache.http.HttpResponse; +import org.apache.http.HttpStatus; +import org.apache.http.StatusLine; +import org.apache.http.client.HttpClient; +import org.apache.http.client.methods.HttpPost; +import org.apache.http.client.methods.HttpUriRequest; +import org.apache.http.util.EntityUtils; +import org.elasticsearch.common.bytes.BytesArray; +import org.elasticsearch.common.bytes.BytesReference; +import org.elasticsearch.common.io.BytesStream; +import org.junit.Before; +import org.junit.Test; +import org.mockito.ArgumentCaptor; +import org.mockito.Mock; + +import java.io.IOException; +import java.util.Iterator; +import java.util.List; + +import static junit.framework.Assert.assertEquals; +import static junit.framework.Assert.assertTrue; +import static org.mockito.Mockito.any; +import static org.mockito.Mockito.isA; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; +import static org.mockito.MockitoAnnotations.initMocks; + +public class TestElasticSearchRestClient { + + private ElasticSearchRestClient fixture; + + @Mock + private ElasticSearchEventSerializer serializer; + + @Mock + private IndexNameBuilder nameBuilder; + + @Mock + private Event event; + + @Mock + private HttpClient httpClient; + + @Mock + private HttpResponse httpResponse; + + @Mock + private StatusLine httpStatus; + + @Mock + private HttpEntity httpEntity; + + private static final String INDEX_NAME = "foo_index"; + private static final String MESSAGE_CONTENT = "{\"body\":\"test\"}"; + private static final String[] HOSTS = {"host1", "host2"}; + + @Before + public void setUp() throws IOException { + initMocks(this); + BytesReference bytesReference = mock(BytesReference.class); + BytesStream bytesStream = mock(BytesStream.class); + + when(nameBuilder.getIndexName(any(Event.class))).thenReturn(INDEX_NAME); + when(bytesReference.toBytesArray()).thenReturn(new BytesArray(MESSAGE_CONTENT)); + when(bytesStream.bytes()).thenReturn(bytesReference); + when(serializer.getContentBuilder(any(Event.class))).thenReturn(bytesStream); + fixture = new ElasticSearchRestClient(HOSTS, serializer, httpClient); + } + + @Test + public void shouldAddNewEventWithoutTTL() throws Exception { + ArgumentCaptor argument = ArgumentCaptor.forClass(HttpPost.class); + + when(httpStatus.getStatusCode()).thenReturn(HttpStatus.SC_OK); + when(httpResponse.getStatusLine()).thenReturn(httpStatus); + when(httpClient.execute(any(HttpUriRequest.class))).thenReturn(httpResponse); + + fixture.addEvent(event, nameBuilder, "bar_type", -1); + fixture.execute(); + + verify(httpClient).execute(isA(HttpUriRequest.class)); + verify(httpClient).execute(argument.capture()); + + assertEquals("http://host1/_bulk", argument.getValue().getURI().toString()); + assertTrue(verifyJsonEvents("{\"index\":{\"_type\":\"bar_type\", \"_index\":\"foo_index\"}}\n", + MESSAGE_CONTENT, EntityUtils.toString(argument.getValue().getEntity()))); + } + + @Test + public void shouldAddNewEventWithTTL() throws Exception { + ArgumentCaptor argument = ArgumentCaptor.forClass(HttpPost.class); + + when(httpStatus.getStatusCode()).thenReturn(HttpStatus.SC_OK); + when(httpResponse.getStatusLine()).thenReturn(httpStatus); + when(httpClient.execute(any(HttpUriRequest.class))).thenReturn(httpResponse); + + fixture.addEvent(event, nameBuilder, "bar_type", 123); + fixture.execute(); + + verify(httpClient).execute(isA(HttpUriRequest.class)); + verify(httpClient).execute(argument.capture()); + + assertEquals("http://host1/_bulk", argument.getValue().getURI().toString()); + assertTrue(verifyJsonEvents( + "{\"index\":{\"_type\":\"bar_type\",\"_index\":\"foo_index\",\"_ttl\":\"123\"}}\n", + MESSAGE_CONTENT, EntityUtils.toString(argument.getValue().getEntity()))); + } + + private boolean verifyJsonEvents(String expectedIndex, String expectedBody, String actual) { + Iterator it = Splitter.on("\n").split(actual).iterator(); + JsonParser parser = new JsonParser(); + JsonObject[] arr = new JsonObject[2]; + for (int i = 0; i < 2; i++) { + arr[i] = (JsonObject) parser.parse(it.next()); + } + return arr[0].equals(parser.parse(expectedIndex)) && arr[1].equals(parser.parse(expectedBody)); + } + + @Test(expected = EventDeliveryException.class) + public void shouldThrowEventDeliveryException() throws Exception { + ArgumentCaptor argument = ArgumentCaptor.forClass(HttpPost.class); + + when(httpStatus.getStatusCode()).thenReturn(HttpStatus.SC_INTERNAL_SERVER_ERROR); + when(httpResponse.getStatusLine()).thenReturn(httpStatus); + when(httpClient.execute(any(HttpUriRequest.class))).thenReturn(httpResponse); + + fixture.addEvent(event, nameBuilder, "bar_type", 123); + fixture.execute(); + } + + @Test() + public void shouldRetryBulkOperation() throws Exception { + ArgumentCaptor argument = ArgumentCaptor.forClass(HttpPost.class); + + when(httpStatus.getStatusCode()).thenReturn(HttpStatus.SC_INTERNAL_SERVER_ERROR, + HttpStatus.SC_OK); + when(httpResponse.getStatusLine()).thenReturn(httpStatus); + when(httpClient.execute(any(HttpUriRequest.class))).thenReturn(httpResponse); + + fixture.addEvent(event, nameBuilder, "bar_type", 123); + fixture.execute(); + + verify(httpClient, times(2)).execute(isA(HttpUriRequest.class)); + verify(httpClient, times(2)).execute(argument.capture()); + + List allValues = argument.getAllValues(); + assertEquals("http://host1/_bulk", allValues.get(0).getURI().toString()); + assertEquals("http://host2/_bulk", allValues.get(1).getURI().toString()); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchTransportClient.java b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchTransportClient.java new file mode 100644 index 0000000..b7b8e74 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/client/TestElasticSearchTransportClient.java @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.flume.sink.elasticsearch.client; + +import org.apache.flume.Event; +import org.apache.flume.EventDeliveryException; +import org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer; +import org.apache.flume.sink.elasticsearch.IndexNameBuilder; +import org.elasticsearch.action.ListenableActionFuture; +import org.elasticsearch.action.bulk.BulkRequestBuilder; +import org.elasticsearch.action.bulk.BulkResponse; +import org.elasticsearch.action.index.IndexRequestBuilder; +import org.elasticsearch.client.Client; +import org.elasticsearch.common.bytes.BytesReference; +import org.elasticsearch.common.io.BytesStream; +import org.junit.Before; +import org.junit.Test; +import org.mockito.Mock; + +import java.io.IOException; + +import static org.mockito.Matchers.any; +import static org.mockito.Matchers.anyString; +import static org.mockito.Mockito.*; +import static org.mockito.MockitoAnnotations.initMocks; + +public class TestElasticSearchTransportClient { + + private ElasticSearchTransportClient fixture; + + @Mock + private ElasticSearchEventSerializer serializer; + + @Mock + private IndexNameBuilder nameBuilder; + + @Mock + private Client elasticSearchClient; + + @Mock + private BulkRequestBuilder bulkRequestBuilder; + + @Mock + private IndexRequestBuilder indexRequestBuilder; + + @Mock + private Event event; + + @Before + public void setUp() throws IOException { + initMocks(this); + BytesReference bytesReference = mock(BytesReference.class); + BytesStream bytesStream = mock(BytesStream.class); + + when(nameBuilder.getIndexName(any(Event.class))).thenReturn("foo_index"); + when(bytesReference.toBytes()).thenReturn("{\"body\":\"test\"}".getBytes()); + when(bytesStream.bytes()).thenReturn(bytesReference); + when(serializer.getContentBuilder(any(Event.class))) + .thenReturn(bytesStream); + when(elasticSearchClient.prepareIndex(anyString(), anyString())) + .thenReturn(indexRequestBuilder); + when(indexRequestBuilder.setSource(bytesReference)).thenReturn( + indexRequestBuilder); + + fixture = new ElasticSearchTransportClient(elasticSearchClient, serializer); + fixture.setBulkRequestBuilder(bulkRequestBuilder); + } + + @Test + public void shouldAddNewEventWithoutTTL() throws Exception { + fixture.addEvent(event, nameBuilder, "bar_type", -1); + verify(indexRequestBuilder).setSource( + serializer.getContentBuilder(event).bytes()); + verify(bulkRequestBuilder).add(indexRequestBuilder); + } + + @Test + public void shouldAddNewEventWithTTL() throws Exception { + fixture.addEvent(event, nameBuilder, "bar_type", 10); + verify(indexRequestBuilder).setTTL(10); + verify(indexRequestBuilder).setSource( + serializer.getContentBuilder(event).bytes()); + } + + @Test + public void shouldExecuteBulkRequestBuilder() throws Exception { + ListenableActionFuture action = + (ListenableActionFuture) mock(ListenableActionFuture.class); + BulkResponse response = mock(BulkResponse.class); + when(bulkRequestBuilder.execute()).thenReturn(action); + when(action.actionGet()).thenReturn(response); + when(response.hasFailures()).thenReturn(false); + + fixture.addEvent(event, nameBuilder, "bar_type", 10); + fixture.execute(); + verify(bulkRequestBuilder).execute(); + } + + @Test(expected = EventDeliveryException.class) + public void shouldThrowExceptionOnExecuteFailed() throws Exception { + ListenableActionFuture action = + (ListenableActionFuture) mock(ListenableActionFuture.class); + BulkResponse response = mock(BulkResponse.class); + when(bulkRequestBuilder.execute()).thenReturn(action); + when(action.actionGet()).thenReturn(response); + when(response.hasFailures()).thenReturn(true); + + fixture.addEvent(event, nameBuilder, "bar_type", 10); + fixture.execute(); + } +} diff --git a/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/resources/log4j.properties b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/resources/log4j.properties new file mode 100644 index 0000000..9036aca --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/resources/log4j.properties @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +log4j.rootLogger = DEBUG, out + +log4j.appender.out = org.apache.log4j.ConsoleAppender +log4j.appender.out.layout = org.apache.log4j.PatternLayout +log4j.appender.out.layout.ConversionPattern = %d (%t) [%p - %l] %m%n + +log4j.logger.org.apache.flume = DEBUG diff --git a/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/artifacts/flume_ng_hbase_sink_jar.xml b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/artifacts/flume_ng_hbase_sink_jar.xml new file mode 100644 index 0000000..f3e9b44 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/artifacts/flume_ng_hbase_sink_jar.xml @@ -0,0 +1,8 @@ + + + $PROJECT_DIR$/out/artifacts/flume_ng_hbase_sink_jar + + + + + \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/compiler.xml b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/compiler.xml new file mode 100644 index 0000000..6e72b1f --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/compiler.xml @@ -0,0 +1,13 @@ + + + + + + + + + + + + + \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/encodings.xml b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/encodings.xml new file mode 100644 index 0000000..b26911b --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/encodings.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/misc.xml b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/misc.xml new file mode 100644 index 0000000..4b661a5 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/misc.xml @@ -0,0 +1,14 @@ + + + + + + + + + + \ No newline at end of file diff --git a/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/workspace.xml b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/workspace.xml new file mode 100644 index 0000000..dd63465 --- /dev/null +++ b/code/flume-ng-sinks/flume-ng-hbase-sink/.idea/workspace.xml @@ -0,0 +1,435 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true + DEFINITION_ORDER + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +