我关注的issues.apache

归档感兴趣的apache issues。有些是关注的项目的重要问题,有些是自己涉及经手的。大部分都是Improvement或者New Feature.

(持续更新…)

 

1. Map-Reduce 2.0

https://issues.apache.org/jira/browse/MAPREDUCE-279

提出了MAP-Reduce2.0的思路,以前很多的都是了解MR1.0,,20的也要逐步深入了。

Description: Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component that manages the application execution.

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons.

相关资源:

hadoop_contributors_meet_07_01_2011.pdf

MapReduce_NextGen_Architecture.pdf

2.?Add ElasticIndexerJob that indexes to elasticsearch

https://issues.apache.org/jira/browse/NUTCH-1445

一个nutch支持ES,还没有来得及细细使用。

We have created a new indexer job ElasticIndexerJob that indexes to elasticsearch. It is orginally based uponhttps://github.com/ctjmorgan/nutch-elasticsearch-indexer?(Apache2 license), but we have modified it greatly to make it integrate as good as possible into Nutch. The greatest modification is that documents are asynchronously flushed in bulk to elasticsearch

3.?Give clients access to the write buffer

https://issues.apache.org/jira/browse/HBASE-1968

原来的team在做GMS时候上报的一个Hbase的issue,报告给了team US的以commitor。其实bug发现也不是很难,问题的reproduct也很容易,就是向Hbase提交的put中包含了一个不存在的column family,因为write buffer中这个坏的put不能被清除,结果导致之后的的put都不成功。感觉挺低级的一个bug。当时的修复手段也怎么看也不是一个解决bug的方法,只是一个workaround让用户对原来内部的private 的buffer有写权限,让用户来做错误处理。呵呵,当时hbase的品质真的不敢恭维,至少当时的感觉是。

看到Issue描述中引用的都是自己邮件中的原文,感觉还是挺好的。希望有机会更密切的持续深入的参与。

From a Trend dev team:

When insert rows into one table by calling the method public synchronized void put(final Put put), if the column family of one row does not exist, the insert operation will failed and throw NoSuchColumnFamilyException.. We observed that all the following insert operation will fails even though all of them have valid column family. That is one exception of insert operation can cause failure of all the following insert operation.

4. FindBugs and javac warnings cleanup

https://issues.apache.org/jira/browse/HBASE-1916

原来的team在做GMS时候上报的一个Hbase的issue,就是对在对team自己的代码进行静态代码扫描的时候,就着对使用的hbase和hadoop的代码页扫了扫,结果hbase中发现和挺多真正的bug,通过读代码核对了其中主要的严重的,筛选了下,报告给了team US的以commitor。而hadoop的大部分都是style、bad practice等类型的改善类的。觉得hbase和hadoop的品质差别还是很明显的。尤其是对 项目的源码时候在细节上的体会。

FindBugs was run on the 0.20 branch code recently and produced ~400 warnings, which were presented to me in a meeting. So I installed the Eclipse FindBugs plugin, analyzed both trunk and 0.20 branch, and systematically went through and addressed the warnings. About a quarter of them were incorrect (spurious, pointless, etc.). Of the remainder, most were stylistic. A few were real bugs. Attached are big patches against trunk and branch which roll up all of the changes. They are too numerous to separate out and mostly do not warrant special attention.

All tests pass with patches applied.

There are a few real bugs fixed:

5. MultiGet, MultiDelete, and MultiPut – batched to the appropriate region servers

https://issues.apache.org/jira/browse/HBASE-1845

里面的关于HBASE-1968问题的讨论惹得我关注了下。

I’ve started to create a general interface for doing these batch/multi calls and would like to get some input and thoughts about how we should handle this and what the protocol should
look like.

原创文章。为了维护文章的版本一致、最新、可追溯,转载请注明: 转载自idouba

本文链接地址: 我关注的issues.apache


No comments yet.

发表评论