從12306網(wǎng)站新驗(yàn)證碼看Web驗(yàn)證碼設(shè)計(jì)與破解

新華書店好書榜 2015-04-24

展開全文

2015年3月16日，鐵路官方購(gòu)票網(wǎng)站12306又出新招，在登錄界面推出了全新的驗(yàn)證方式，用戶在填寫好登錄名和密碼之后，還要準(zhǔn)確的選取圖片驗(yàn)證碼才能登陸成功。據(jù)悉，12306驗(yàn)證碼改版后，目前所有搶票工具都已經(jīng)無法登錄。

多么慘絕人寰的消息，小編相信各大互聯(lián)網(wǎng)公司都在潛心鉆研新的搶票助手，來破解全新的驗(yàn)證碼模式。

下面小編帶大家看看各種驗(yàn)證碼的設(shè)計(jì)原理及其破解方法。

首先是純文本式驗(yàn)證碼，是比較原始的一種。

這種驗(yàn)證碼并不符合驗(yàn)證碼的定義，因?yàn)橹挥凶詣?dòng)生成的問題才能用做驗(yàn)證碼，這種文字驗(yàn)證碼都是從題庫(kù)里選擇出來的，數(shù)量有限。破解方式也很簡(jiǎn)單，多刷新幾次，建立題庫(kù)和對(duì)應(yīng)的答案，用正則從網(wǎng)頁(yè)里抓取問題，尋找匹配的答案后破解。也有些用隨機(jī)生成的數(shù)學(xué)公式，比如隨機(jī)數(shù) [+-*/]隨機(jī)運(yùn)算符隨機(jī)數(shù)=?，小學(xué)生水平的程序員也可以搞定……

這種驗(yàn)證碼也不是一無是處，對(duì)于很多見到表單就來一發(fā)的spam bot來說，實(shí)在沒必要單獨(dú)為了一個(gè)網(wǎng)站下那么大功夫。對(duì)于鐵了心要在你的網(wǎng)站大量灌水的人，這種驗(yàn)證碼和沒有一樣。

第二個(gè)是目前比較主流的圖片驗(yàn)證碼：

這類圖片驗(yàn)證碼的原理就是通過字符的粘連增加及其識(shí)別的難度，而上邊這種一般用于不大的網(wǎng)站。

這類驗(yàn)證碼處理方式：

圖片預(yù)處理

怎么去掉背景干擾呢？可以注意到每個(gè)驗(yàn)證碼數(shù)字或字母都是同一顏色，所以把驗(yàn)證碼平均分成5份

計(jì)算每個(gè)區(qū)域的顏色分布，除了白色之外，顏色值最多的就是驗(yàn)證碼的顏色，因此很容易將背景去掉

代碼：

1.public static BufferedImage removeBackgroud(String picFile)
2. throws Exception {
3. BufferedImage img = ImageIO.read(new File(picFile));
4. img = img.getSubimage(1, 1, img.getWidth() - 2, img.getHeight() - 2);
5. int width = img.getWidth();
6. int height = img.getHeight();
7. double subWidth = (double) width / 5.0;
8. for (int i = 0; i < 5; i++) {
9. Map<Integer, Integer> map = new HashMap<Integer, Integer>();
10. for (int x = (int) (1 + i * subWidth); x < (i + 1) * subWidth
11. && x < width - 1; ++x) {
12. for (int y = 0; y < height; ++y) {
13. if (isWhite(img.getRGB(x, y)) == 1)
14. continue;
15. if (map.containsKey(img.getRGB(x, y))) {
16. map.put(img.getRGB(x, y), map.get(img.getRGB(x, y)) + 1);
17. } else {
18. map.put(img.getRGB(x, y), 1);
19. }
20. }
21. }
22. int max = 0;
23. int colorMax = 0;
24. for (Integer color : map.keySet()) {
25. if (max < map.get(color)) {
26. max = map.get(color);
27. colorMax = color;
28. }
29. }
30. for (int x = (int) (1 + i * subWidth); x < (i + 1) * subWidth
31. && x < width - 1; ++x) {
32. for (int y = 0; y < height; ++y) {
33. if (img.getRGB(x, y) != colorMax) {
34. img.setRGB(x, y, Color.WHITE.getRGB());
35. } else {
36. img.setRGB(x, y, Color.BLACK.getRGB());
37. }
38. }
39. }
40. }
41. return img;
得到與下圖

接著是對(duì)圖片進(jìn)行縱向掃描進(jìn)行切割。

再對(duì)每一部分橫向掃描

然后進(jìn)行訓(xùn)練

最后因?yàn)楣潭ù笮?，識(shí)別跟驗(yàn)證碼識(shí)別--1 里面一樣，像素比較就可以了。

源碼：

1.public class ImagePreProcess2 {
2.
3. private static Map<BufferedImage, String> trainMap = null;
4. private static int index = 0;
5.
6. public static int isBlack(int colorInt) {
7. Color color = new Color(colorInt);
8. if (color.getRed() + color.getGreen() + color.getBlue() <= 100) {
9. return 1;
10. }
11. return 0;
12. }
13.
14. public static int isWhite(int colorInt) {
15. Color color = new Color(colorInt);
16. if (color.getRed() + color.getGreen() + color.getBlue() > 100) {
17. return 1;
18. }
19. return 0;
20. }
21.
22. public static BufferedImage removeBackgroud(String picFile)
23. throws Exception {
24. BufferedImage img = ImageIO.read(new File(picFile));
25. return img;
26. }
27.
28. public static BufferedImage removeBlank(BufferedImage img) throws Exception {
29. int width = img.getWidth();
30. int height = img.getHeight();
31. int start = 0;
32. int end = 0;
33. Label1: for (int y = 0; y < height; ++y) {
34. int count = 0;
35. for (int x = 0; x < width; ++x) {
36. if (isWhite(img.getRGB(x, y)) == 1) {
37. count++;
38. }
39. if (count >= 1) {
40. start = y;
41. break Label1;
42. }
43. }
44. }
45. Label2: for (int y = height - 1; y >= 0; --y) {
46. int count = 0;
47. for (int x = 0; x < width; ++x) {
48. if (isWhite(img.getRGB(x, y)) == 1) {
49. count++;
50. }
51. if (count >= 1) {
52. end = y;
53. break Label2;
54. }
55. }
56. }
57. return img.getSubimage(0, start, width, end - start + 1);
58. }
59.
60. public static List<BufferedImage> splitImage(BufferedImage img)
61. throws Exception {
62. List<BufferedImage> subImgs = new ArrayList<BufferedImage>();
63. int width = img.getWidth();
64. int height = img.getHeight();
65. List<Integer> weightlist = new ArrayList<Integer>();
66. for (int x = 0; x < width; ++x) {
67. int count = 0;
68. for (int y = 0; y < height; ++y) {
69. if (isWhite(img.getRGB(x, y)) == 1) {
70. count++;
71. }
72. }
73. weightlist.add(count);
74. }
75. for (int i = 0; i < weightlist.size();) {
76. int length = 0;
77. while (weightlist.get(i++) > 1) {
78. length++;
79. }
80. if (length > 12) {
81. subImgs.add(removeBlank(img.getSubimage(i - length - 1, 0,
82. length / 2, height)));
83. subImgs.add(removeBlank(img.getSubimage(i - length / 2 - 1, 0,
84. length / 2, height)));
85. } else if (length > 3) {
86. subImgs.add(removeBlank(img.getSubimage(i - length - 1, 0,
87. length, height)));
88. }
89. }
90. return subImgs;
91. }
92.
93. public static Map<BufferedImage, String> loadTrainData() throws Exception {
94. if (trainMap == null) {
95. Map<BufferedImage, String> map = new HashMap<BufferedImage, String>();
96. File dir = new File("train2");
97. File[] files = dir.listFiles();
98. for (File file : files) {
99. map.put(ImageIO.read(file), file.getName().charAt(0) + "");
100. }
101. trainMap = map;
102. }
103. return trainMap;
104. }
105.
106. public static String getSingleCharOcr(BufferedImage img,
107. Map<BufferedImage, String> map) {
108. String result = "";
109. int width = img.getWidth();
110. int height = img.getHeight();
111. int min = width * height;
112. for (BufferedImage bi : map.keySet()) {
113. int count = 0;
114. int widthmin = width < bi.getWidth() ? width : bi.getWidth();
115. int heightmin = height < bi.getHeight() ? height : bi.getHeight();
116. Label1: for (int x = 0; x < widthmin; ++x) {
117. for (int y = 0; y < heightmin; ++y) {
118. if (isWhite(img.getRGB(x, y)) != isWhite(bi.getRGB(x, y))) {
119. count++;
120. if (count >= min)
121. break Label1;
122. }
123. }
124. }
125. if (count < min) {
126. min = count;
127. result = map.get(bi);
128. }
129. }
130. return result;
131. }
132.
133. public static String getAllOcr(String file) throws Exception {
134. BufferedImage img = removeBackgroud(file);
135. List<BufferedImage> listImg = splitImage(img);
136. Map<BufferedImage, String> map = loadTrainData();
137. String result = "";
138. for (BufferedImage bi : listImg) {
139. result += getSingleCharOcr(bi, map);
140. }
141. ImageIO.write(img, "JPG", new File("result2//" + result + ".jpg"));
142. return result;
143. }
144.
145. public static void downloadImage() {
146. HttpClient httpClient = new HttpClient();
147. GetMethod getMethod = null;
148. for (int i = 0; i < 30; i++) {
149. getMethod = new GetMethod("http://www./img.php?key="
150. + (2000 + i));
151. try {
152. // 執(zhí)行g(shù)etMethod
153. int statusCode = httpClient.executeMethod(getMethod);
154. if (statusCode != HttpStatus.SC_OK) {
155. System.err.println("Method failed: "
156. + getMethod.getStatusLine());
157. }
158. // 讀取內(nèi)容
159. String picName = "img2//" + i + ".jpg";
160. InputStream inputStream = getMethod.getResponseBodyAsStream();
161. OutputStream outStream = new FileOutputStream(picName);
162. IOUtils.copy(inputStream, outStream);
163. outStream.close();
164. System.out.println(i + "OK!");
165. } catch (Exception e) {
166. e.printStackTrace();
167. } finally {
168. // 釋放連接
169. getMethod.releaseConnection();
170. }
171. }
172. }
173.
174. public static void trainData() throws Exception {
175. File dir = new File("temp");
176. File[] files = dir.listFiles();
177. for (File file : files) {
178. BufferedImage img = removeBackgroud("temp//" + file.getName());
179. List<BufferedImage> listImg = splitImage(img);
180. if (listImg.size() == 4) {
181. for (int j = 0; j < listImg.size(); ++j) {
182. ImageIO.write(listImg.get(j), "JPG", new File("train2//"
183. + file.getName().charAt(j) + "-" + (index++)
184. + ".jpg"));
185. }
186. }
187. }
188. }
189.
190. /**
191. * @param args
192. * @throws Exception
193. */
194. public static void main(String[] args) throws Exception {
195. // downloadImage();
196. for (int i = 0; i < 30; ++i) {
197. String text = getAllOcr("img2//" + i + ".jpg");
198. System.out.println(i + ".jpg = " + text);
199. }
200. }
201.}
像BAT這種巨頭的驗(yàn)證碼通過干擾線、加粗不加粗混用、采用中文常用字（中文常用字大概有5000個(gè)，筆畫繁復(fù)，形似字多，比起26個(gè)字母難度高很多）、不同的字體混用，比如楷體、宋體、幼圓混用、拼音，扭曲字體、需要準(zhǔn)確識(shí)別13位漢字，大大增加了失敗概率。

當(dāng)然除了主流的圖片驗(yàn)證碼外，一些網(wǎng)站為了照顧視力不好的用戶，采用語音驗(yàn)證碼。一般這種驗(yàn)證碼是機(jī)器生成一段讀數(shù)字的語音。但是在這方面上很多程序員都偷懶了，預(yù)先找了10個(gè)數(shù)字的聲音錄音，然后生成的時(shí)候把他們隨機(jī)拼到一起，結(jié)果就是這樣：

設(shè)計(jì)原理如下：

整體效果

·字符數(shù)量一定范圍內(nèi)隨機(jī)

·字體大小一定范圍內(nèi)隨機(jī)

·波浪扭曲(角度方向一定范圍內(nèi)隨機(jī))

·防識(shí)別

·不要過度依賴防識(shí)別技術(shù)

·不要使用過多字符集-用戶體驗(yàn)差

·防分割 ·

重疊粘連比干擾線效果好

·備用計(jì)劃

·同樣強(qiáng)度完全不同的一套驗(yàn)證碼

既然原理都已經(jīng)知道了，那么如何破解就變得簡(jiǎn)單了。

但是問題來了，這次12306的驗(yàn)證碼居然是圖片，以上方式都不能使用，那么就不能破解了么？

有人認(rèn)為12306的網(wǎng)站圖片內(nèi)存不會(huì)太大，完全可以扒下來，然后進(jìn)行破解。當(dāng)然這是紙上談兵，有一種非常先進(jìn)又非常原始的辦法叫做“網(wǎng)絡(luò)打碼”或者“人肉打碼”

一些技術(shù)大牛把驗(yàn)證碼發(fā)送的自制的“打碼”軟件上，而一些“打碼工”通過這個(gè)程序來輸入機(jī)器自動(dòng)注冊(cè)，出來的驗(yàn)證碼，傳輸?shù)阶詣?dòng)注冊(cè)機(jī)器，完成驗(yàn)證。

目前來看這種簡(jiǎn)單粗暴的方法可以應(yīng)對(duì)目前的情況。

結(jié)語：

12306這次可謂出了殺招，把所有搶票軟件一刀砍死，黃牛們不開心我們就可以買到票了。既解決了黃牛問題又為廣大程序員出了一道難題。

※神站導(dǎo)航→ ⌒精品資源永久網(wǎng)址⌒ | 《你所看不見的神站大爆光》 | ♂邀請(qǐng)碼自取♀

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：新華書店好書榜 > 《「網(wǎng)絡(luò)安全」》

舉報(bào)/認(rèn)領(lǐng)