【Java 7源碼分析第10篇

Mis林桑 2017-04-03

展開全文

可能我們經(jīng)常在實際應用或面試中注重的是各個集合的特點，比如，Set集合不能存儲重復元素，不能保持插入和大小順序，Map集合存儲key-value對等等。但是，如果你想學好Java，或者目標更高一點，想要精通Java，那僅掌握這些是遠遠不夠的。其實集合中還有需多的知道需要我們?nèi)フ莆蘸土私狻?/p>

我們首先來看Map集合的實現(xiàn)而不是Set集合，因為Map和Set集合實現(xiàn)上非常的相似，Set中許多方法都是通過調(diào)用Map中的方法來實現(xiàn)功能的。為什么呢？因為Map可以說就是一個Set，Set集合不重復且無序、Map中的key也有這個特性。當我們將Map中的value看為key的附屬時，獲取到的所有key就可以組成一個Set集合。

來看一下兩者實現(xiàn)的類圖：

Set 集合框架圖

Map集合框架圖

先來看一下Map接口，源代碼如下：

public interface Map { // Query Operations int size(); boolean isEmpty(); boolean containsKey(Object key); boolean containsValue(Object value); V get(Object key); // Modification Operations V put(K key, V value); V remove(Object key); // Bulk Operations /* The behavior of this operation is undefined if the specified map is modified while the operation is in progress. */ void putAll(Map m); void clear(); // Views Set keySet();// 由于Map集合的key不能重復，key之間無順序，所以Map集合中的所有key就可以組成一個Set集合 Collection values(); Set<>> entrySet(); interface Entry { K getKey(); V getValue(); V setValue(V value); boolean equals(Object o); int hashCode(); } // Comparison and hashing boolean equals(Object o); int hashCode();}

接口中有一個values方法，通過調(diào)用這個方法就可以返回Map集合中所有的value值；有一個keySet()方法，調(diào)用后可以得到所有Map中的 key值；調(diào)用entrySet()方法得到所有的Map中key-value對，以Set集合的形式存儲。為了能夠更好的表示這個key-value值，接口中還定義了一個Entry接口，并且在這個接口中定義了一些操作key和value的方法。

public class HashMap extends AbstractMap implements Map, Cloneable, Serializable{ // The default initial capacity - MUST be a power of two. static final int DEFAULT_INITIAL_CAPACITY = 16; static final int MAXIMUM_CAPACITY = 1 < 30;="" static="" final="" float="" default_load_factor="0.75f;//指定負載因子" the="" table,="" resized="" as="" necessary.="" length="" must="" always="" be="" a="" power="" of="" two.="" 使用entry數(shù)組來存儲key-value，類似于arraylist用object[]來存儲集合元素="" transient="" entry[]="" table;="" transient="" int="" size;="" the="" next="" size="" value="" at="" which="" to="" resize="" (capacity="" *="" load="" factor).="" hashmap所能容的mapping的極限="" int="" threshold;="" *="" 負載因子:="" */="" final="" float="" loadfactor;="" transient="" int="">

如上定義了一些重要的變量，其中的loadFactor是負載因子，增大值時可以減少Hash表（也就是Entry數(shù)組）所占用的內(nèi)存空間，但會增加查詢數(shù)據(jù)時時間的開銷，而查詢是最頻繁的操作，減小值時會提高數(shù)據(jù)查詢的性能，但是會增大Hash表所占用的內(nèi)存空間，所以一般是默認的0.75。
threshold表示HashMap所能容納的key-value對極限，如果存儲的size數(shù)大于了threshold，則需要擴容了。
hashmap提供了幾個構(gòu)造函數(shù)，如下：

// 健HashMap的實際容量肯定大于等于initialCapacity，當這個值恰好為2的n次方時正好等于 public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0)="" throw="" new="" illegalargumentexception('illegal="" initial="" capacity:="" '="" +="" initialcapacity);="" if="" (initialcapacity=""> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0="" ||="" float.isnan(loadfactor))="" throw="" new="" illegalargumentexception('illegal="" load="" factor:="" '="" +="" loadfactor);="" find="" a="" power="" of="" 2="">= initialCapacity int capacity = 1; while (capacity < initialcapacity)="" 計算出大于initialcapacity的最小的2的n次方值="" capacity=""><= 1;="" this.loadfactor="loadFactor;" threshold="(int)(capacity" *="" loadfactor);//設(shè)置容量極限="" table="new" entry[capacity];="" init();="" }="" public="" hashmap(int="" initialcapacity)="" {="" this(initialcapacity,="" default_load_factor);="" }="" public="" hashmap()="" {="" this.loadfactor="DEFAULT_LOAD_FACTOR;" threshold="(int)(DEFAULT_INITIAL_CAPACITY" *="" default_load_factor);="" table="new" entry[default_initial_capacity];="" init();="" }="" public=""> m) { this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR); putAllForCreate(m); }

由第一個構(gòu)造函數(shù)可以看出，其實際的capacity一般是大于我們指定的initialCapacity，除非initialCapacity正好是2的n次方值。接著就來說一下HashMap的實現(xiàn)原理吧。在這個類中開始處理定義了一個transient Entry[] 數(shù)組，這個Entry的實現(xiàn)如下：

private static class Entry implements Map.Entry { int hash; K key; V value; Entry next; protected Entry(int hash, K key, V value, Entry next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } protected Object clone() { return new Entry<>(hash, key, value,(next==null ? null : (Entry) next.clone())); } // Map.Entry Ops public K getKey() { return key; } public V getValue() { return value; } public V setValue(V value) { if (value == null) throw new NullPointerException(); V oldValue = this.value; this.value = value; return oldValue; } public boolean equals(Object o) { if (!(o instanceof Map.Entry)) return false; Map.Entry e = (Map.Entry)o; return (key==null ? e.getKey()==null : key.equals(e.getKey())) && (value==null ? e.getValue()==null : value.equals(e.getValue())); } public int hashCode() { return hash ^ (value==null ? 0 : value.hashCode()); } public String toString() { return key.toString()+'='+value.toString(); } }

了解了key-value存儲的基本結(jié)構(gòu)后，就可以考慮如何存儲的問題了。HashMap顧名思義就是使用哈希表來存儲的，哈希表為解決沖突，采用了開放地址法和鏈地址法來解決問題。Java中HashMap采用了鏈地址法。鏈地址法，簡單來說，就是數(shù)組加鏈表的結(jié)合。在每個數(shù)組元素上都一個鏈表結(jié)構(gòu)，當數(shù)據(jù)被hash后，得到數(shù)組下標，把數(shù)據(jù)放在對應下標元素的鏈表上。當程序試圖將多個 key-value 放入 HashMap 中時，以如下代碼片段為例：

HashMap map = new HashMap(); map.put('語文' , 80.0); map.put('數(shù)學' , 89.0); map.put('英語' , 78.2);

HashMap 采用一種所謂的“Hash 算法”來決定每個元素的存儲位置。
當程序執(zhí)行 map.put('語文' , 80.0); 時，系統(tǒng)將調(diào)用'語文'的 hashCode() 方法得到其 hashCode 值——每個 Java 對象都有 hashCode() 方法，都可通過該方法獲得它的 hashCode 值。得到這個對象的 hashCode 值之后，系統(tǒng)會根據(jù)該 hashCode 值來決定該元素的存儲位置。

我們可以看 HashMap 類的 put(K key , V value) 方法的源代碼：

// 當向HashMap中添加mapping時，由key的hashCode值決定Entry對象的存儲位置，當兩個 key的hashCode相同時，// 通過equals()方法比較，返回false產(chǎn)生Entry鏈，true時采用覆蓋行為 public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }

當系統(tǒng)決定存儲 HashMap 中的 key-value 對時，完全沒有考慮 Entry 中的 value，僅僅只是根據(jù) key 來計算并決定每個 Entry 的存儲位置。這也說明了前面的結(jié)論：我們完全可以把 Map 集合中的 value 當成 key 的附屬，當系統(tǒng)決定了 key 的存儲位置之后，value 隨之保存在那里即可。
上面方法提供了一個根據(jù) hashCode() 返回值來計算 Hash 碼的方法：并且會調(diào)用 indexFor() 方法來計算該對象應該保存在 table 數(shù)組的哪個索引，源代碼如下：

static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } static int indexFor(int h, int length) { return h & (length-1); }

length是table.length，所以數(shù)組的長度總是 2 的 n 次方，通過h&(length-1)計算后，保證計算得到的索引值位于 table 數(shù)組的索引之內(nèi)，計算的函數(shù)并不總是這樣的，好的計算函數(shù)應該還會讓索引值盡量平均分布到數(shù)組中。

當調(diào)用put()方法向 HashMap 中添加 key-value 對，由其 key 的 hashCode() 返回值決定該 key-value 對（就是Entry 對象）的存儲位置。當兩個 Entry 對象的 key 的 hashCode() 返回值相同時，將由 key 通過 eqauls() 比較值決定是采用覆蓋行為（返回 true），還是產(chǎn)生 Entry 鏈（返回 false），而且新添加的 Entry 位于 Entry 鏈的頭部，這是通過調(diào)用addEntry()方法來完成的，源碼如下：

void addEntry(int hash, K key, V value, int bucketIndex){ Entry e = table[bucketIndex]; // 獲取指定 bucketIndex 索引處的 Entry table[bucketIndex] = new Entry(hash, key, value, e); // 將新創(chuàng)建的 Entry 放入 bucketIndex 索引處，并讓新的 Entry 指向原來的 Entry // 如果 Map 中的 key-value 對的數(shù)量超過了極限 if (size++ >= threshold) resize(2 * table.length); // 把 table 對象的長度擴充到 2 倍}

程序總是將新添加的 Entry 對象放入 table數(shù)組的 bucketIndex 索引處——如果 bucketIndex 索引處已經(jīng)有了一個 Entry 對象，那新添加的 Entry 對象指向原有的 Entry 對象（產(chǎn)生一個 Entry 鏈），如果 bucketIndex 索引處沒有 Entry 對象， e 變量是 null，也就是新放入的 Entry 對象指向 null，也就是沒有產(chǎn)生 Entry 鏈。

接下來看一下HashMap是如何讀取的。 get()方法的源代碼如下：

public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); // 搜索該Entry鏈的下一個Entry，有多個Entry鏈時必須順序遍歷，降低了索引的速度 // 如果Entry鏈過長，說明發(fā)生“Hash”沖突比較頻繁，需要采用新的算法或增大空間 for (Entry e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } 　 return null; }

當 HashMap 的每個 bucket 里存儲的 Entry 只是單個 Entry 時的 HashMap 具有最好的性能：當程序通過 key 取出對應 value 時，只要先計算出該 key 的 hashCode() 返回值，在根據(jù)該 hashCode 返回值找出該 key 在 table 數(shù)組中的索引，然后循環(huán)遍歷查找 hash值相同,key值相同的value。

下面來看一下HashMap是如何實現(xiàn)如下的三個方法的，源代碼如下：

public Set keySet() { Set ks = keySet; return (ks != null ? ks : (keySet = new KeySet())); } public Collection values() { Collection vs = values; return (vs != null ? vs : (values = new Values())); } public Set<>> entrySet() { return entrySet0(); } private Set<>> entrySet0() { Set<>> es = entrySet; return es != null ? es : (entrySet = new EntrySet()); }

分別得到了KeySet、Values和EntrySet私有類的實例，那么他們是怎么從HashMap中取出這些值的呢？其實這里會涉及到非常多的類和方法，大概框架如下所示：

如上類中最重要的就是HashEntry類的實現(xiàn)，如下：

private abstract class HashIterator implements Iterator { Entry next; // next entry to return int expectedModCount; // For fast-fail int index; // current slot Entry current; // current entry HashIterator() { expectedModCount = modCount; if (size > 0) { // advance to first entry Entry[] t = table; while (index < t.length="" &&="" (next="t[index++])" =="null);//" 將index指向第一個table不為null的位置="" }="" }="" public="" final="" boolean="" hasnext()="" {="" return="" next="" !="null;" }="" final=""> nextEntry() {// 遍歷Entry鏈 if (modCount != expectedModCount) throw new ConcurrentModificationException(); Entry e = next; if (e == null) throw new NoSuchElementException(); if ((next = e.next) == null) { Entry[] t = table; while (index < t.length="" &&="" (next="t[index++])" =="null);" }="" current="e;" return="" e;="" }="" public="" void="" remove()="" {="" if="" (current="=" null)="" throw="" new="" illegalstateexception();="" if="" (modcount="" !="expectedModCount)" throw="" new="" concurrentmodificationexception();="" object="" k="current.key;" current="null;" hashmap.this.removeentryforkey(k);="" expectedmodcount="modCount;" }="">

和ArrayList一樣，在獲取到HashMap的Iterator對象后，就不可以使用ArrayList進行添加或刪除的操作了，否則會出現(xiàn)異常?？匆幌聨讉€重要的變量，如圖示。