Autonomous Vehicles - 自動(dòng)駕駛1. 【Autonomous Vehicles】Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes 【自動(dòng)駕駛】城市場景語義分割的課程域自適應(yīng) 作者:Yang Zhang, Philip David, Boqing Gong 鏈接: https:///abs/1707.09465v5 代碼: https://github.com/YangZhang4065/AdaptationSeg 英文摘要: During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach. 中文摘要: 在過去的五年中,卷積神經(jīng)網(wǎng)絡(luò)(CNN)戰(zhàn)勝了語義分割,這是自動(dòng)駕駛等許多應(yīng)用的核心任務(wù)之一。然而,訓(xùn)練CNN需要大量的數(shù)據(jù),這些數(shù)據(jù)難以收集且難以標(biāo)注。計(jì)算機(jī)圖形學(xué)的最新進(jìn)展使得使用計(jì)算機(jī)生成的注釋在照片般逼真的合成圖像上訓(xùn)練CNN成為可能。盡管如此,真實(shí)圖像和合成數(shù)據(jù)之間的域不匹配會(huì)削弱模型的性能。因此,我們提出了一種課程式學(xué)習(xí)方法,以最小化城市景觀語義分割中的領(lǐng)域差距。課程領(lǐng)域適應(yīng)首先解決了簡單的任務(wù),以推斷目標(biāo)領(lǐng)域的必要屬性;特別是,第一個(gè)任務(wù)是學(xué)習(xí)圖像上的全局標(biāo)簽分布和地標(biāo)超像素上的局部分布。這些很容易估計(jì),因?yàn)槌鞘袌鼍暗膱D像具有很強(qiáng)的特性(例如,建筑物、街道、汽車等的大小和空間關(guān)系)。然后,我們訓(xùn)練一個(gè)分割網(wǎng)絡(luò),同時(shí)規(guī)范其在目標(biāo)域中的預(yù)測,以遵循這些推斷的屬性。在實(shí)驗(yàn)中,我們的方法在兩個(gè)數(shù)據(jù)集和兩個(gè)骨干網(wǎng)絡(luò)上優(yōu)于基線。我們還報(bào)告了有關(guān)我們方法的廣泛消融研究。 2. 【Autonomous Vehicles】Fast Scene Understanding for Autonomous Driving 【自動(dòng)駕駛】自動(dòng)駕駛的快速場景理解 作者:Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool 鏈接: https:///abs/1708.02550v1 代碼: https://github.com/davyneven/fastSceneUnderstanding 英文摘要: Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately. 中文摘要: 大多數(shù)實(shí)例感知語義標(biāo)簽的方法傳統(tǒng)上都側(cè)重于準(zhǔn)確性。運(yùn)行時(shí)間和內(nèi)存占用等其他方面可以說對于自動(dòng)駕駛等實(shí)時(shí)應(yīng)用程序同樣重要。受這一觀察的啟發(fā),并受到最近使用單一集成架構(gòu)處理多個(gè)任務(wù)的工作的啟發(fā),在本文中,我們提出了一種基于ENet的實(shí)時(shí)高效實(shí)現(xiàn),它同時(shí)解決了三個(gè)與自動(dòng)駕駛相關(guān)的任務(wù):語義場景分割、實(shí)例分割和單目深度估計(jì)。我們的方法建立在一個(gè)分支的ENet架構(gòu)上,該架構(gòu)具有一個(gè)共享的編碼器,但三個(gè)任務(wù)中的每一個(gè)任務(wù)都有不同的解碼器分支。與單獨(dú)運(yùn)行每個(gè)任務(wù)相比,所提出的方法可以在Cityscapes數(shù)據(jù)集上以1024x512的分辨率以21fps的速度運(yùn)行,而不會(huì)犧牲準(zhǔn)確性。 3. 【Autonomous Vehicles】Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues 【自動(dòng)駕駛】深度轉(zhuǎn)向:從空間和時(shí)間視覺線索中學(xué)習(xí)端到端駕駛模型 作者:Lu Chi, Yadong Mu 鏈接: https:///abs/1708.03798v1 代碼: https://github.com/abhileshborode/Behavorial-Clonng-Self-driving-cars 英文摘要: In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle's historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car. 中文摘要: 近年來,使用低成本車載攝像頭的自動(dòng)駕駛算法吸引了學(xué)術(shù)界和工業(yè)界越來越多的努力。這些努力有多個(gè)方面,包括道路上的對象檢測、3-D重建等,但在這項(xiàng)工作中,我們專注于基于視覺的模型,該模型使用深度網(wǎng)絡(luò)將原始輸入圖像直接映射到轉(zhuǎn)向角。這代表了計(jì)算機(jī)視覺領(lǐng)域的一個(gè)新興研究課題。這項(xiàng)工作的技術(shù)貢獻(xiàn)是三方面的。首先,在與其他車輛傳感器時(shí)間同步的真實(shí)人類駕駛視頻上學(xué)習(xí)和評估模型。這與許多從賽車游戲中的合成數(shù)據(jù)訓(xùn)練的先前模型不同。其次,最先進(jìn)的模型,如PilotNet,大多獨(dú)立地預(yù)測每個(gè)視頻幀上的車輪角度,這與將駕駛作為有狀態(tài)過程的普遍理解相矛盾。相反,我們提出的模型結(jié)合了空間和時(shí)間線索,共同研究瞬時(shí)單目相機(jī)觀察和車輛的歷史狀態(tài)。這實(shí)際上是通過在適當(dāng)?shù)木W(wǎng)絡(luò)層插入精心設(shè)計(jì)的循環(huán)單元(例如LSTM和Conv-LSTM)來實(shí)現(xiàn)的。第三,為了促進(jìn)學(xué)習(xí)模型的可解釋性,我們利用視覺反向傳播方案來發(fā)現(xiàn)和可視化對最終轉(zhuǎn)向預(yù)測產(chǎn)生關(guān)鍵影響的圖像區(qū)域。我們的實(shí)驗(yàn)研究基于Udacity提供的大約6小時(shí)的人類駕駛數(shù)據(jù)。全面的定量評估證明了我們模型的有效性和穩(wěn)健性,即使在劇烈的照明變化和突然轉(zhuǎn)彎等場景下也是如此。與其他最先進(jìn)模型的比較清楚地表明,它在預(yù)測自動(dòng)駕駛汽車的應(yīng)有車輪角度方面具有卓越的性能。 4. 【Autonomous Vehicles】Free Space Estimation using Occupancy Grids and Dynamic Object Detection 【自動(dòng)駕駛】使用占用網(wǎng)格和動(dòng)態(tài)對象檢測的自由空間估計(jì) 作者:Raghavender Sahdev 鏈接: https:///abs/1708.04989v1 代碼: https://github.com/raghavendersahdev/Free-Space 英文摘要: In this paper we present an approach to estimate Free Space from a Stereo image pair using stochastic occupancy grids. We do this in the domain of autonomous driving on the famous benchmark dataset KITTI. Later based on the generated occupancy grid we match 2 image sequences to compute the top view representation of the map. We do this to map the environment. We compute a transformation between the occupancy grids of two successive images and use it to compute the top view map. Two issues need to be addressed for mapping are discussed - computing a map and dealing with dynamic objects for computing the map. Dynamic Objects are detected in successive images based on an idea similar to tracking of foreground objects from the background objects based on motion flow. A novel RANSAC based segmentation approach has been proposed here to address this issue. 中文摘要: 在本文中,我們提出了一種使用隨機(jī)占用網(wǎng)格從立體圖像對估計(jì)自由空間的方法。我們在著名的基準(zhǔn)數(shù)據(jù)集KITTI上的自動(dòng)駕駛領(lǐng)域這樣做。稍后基于生成的占用網(wǎng)格,我們匹配2個(gè)圖像序列來計(jì)算地圖的頂視圖表示。我們這樣做是為了映射環(huán)境。我們計(jì)算兩個(gè)連續(xù)圖像的占用網(wǎng)格之間的轉(zhuǎn)換,并使用它來計(jì)算頂視圖地圖。討論了映射需要解決的兩個(gè)問題-計(jì)算地圖和處理用于計(jì)算地圖的動(dòng)態(tài)對象?;陬愃朴诨谶\(yùn)動(dòng)流從背景對象跟蹤前景對象的想法,在連續(xù)圖像中檢測動(dòng)態(tài)對象。這里提出了一種新的基于RANSAC的分割方法來解決這個(gè)問題。 5. 【Autonomous Vehicles】Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions 【自動(dòng)駕駛】爭論機(jī)器:人類對黑匣子人工智能系統(tǒng)的監(jiān)督,這些系統(tǒng)會(huì)做出至關(guān)重要的決定 作者:Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer 鏈接: https:///abs/1710.04459v2 代碼: https://github.com/scope-lab-vu/deep-nn-car 英文摘要: We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an 'arguing machines' framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision. 中文摘要: 我們考慮了一個(gè)黑盒人工智能系統(tǒng)的范式,它可以做出至關(guān)重要的決定。我們提出了一個(gè)“爭論機(jī)器”框架,它將主要的AI系統(tǒng)與經(jīng)過獨(dú)立訓(xùn)練以執(zhí)行相同任務(wù)的輔助系統(tǒng)配對。我們表明,在沒有任何底層系統(tǒng)設(shè)計(jì)或操作知識的情況下,兩個(gè)系統(tǒng)之間的分歧足以在人工監(jiān)督分歧的情況下任意提高整體決策管道的準(zhǔn)確性。我們在兩個(gè)應(yīng)用中演示了該系統(tǒng):(1)圖像分類的說明性示例和(2)大規(guī)模真實(shí)世界的半自動(dòng)駕駛數(shù)據(jù)。對于第一個(gè)應(yīng)用,我們將此框架應(yīng)用于圖像分類,將ImageNet上的top-5錯(cuò)誤從8.0%減少到2.8%。對于第二個(gè)應(yīng)用,我們將此框架應(yīng)用于Tesla Autopilot,并展示了預(yù)測90.4%的系統(tǒng)脫離的能力,這些脫離被人工注釋者標(biāo)記為具有挑戰(zhàn)性且需要人工監(jiān)督。 AI&R是人工智能與機(jī)器人垂直領(lǐng)域的綜合信息平臺。我們的愿景是成為通往AGI(通用人工智能)的高速公路,連接人與人、人與信息,信息與信息,讓人工智能與機(jī)器人沒有門檻。 歡迎各位AI與機(jī)器人愛好者關(guān)注我們,每天給你有深度的內(nèi)容。 |
|