Multi-level structure saliency target detection method based on RCNN

In recent years, with the rapid development of artificial intelligence technology, the combination of traditional automobile industry and information technology has made great progress in the research of automobile automatic driving technology. Many large companies in the industry have invested heavily in research and development in this field, such as foreign countries. Google, Toyota, domestic Baidu, BYD and other companies have launched autonomous vehicles, and the results are satisfactory:

Google’s self-driving cars have traveled safely for more than 140,000 miles;

Toyota announced that its autopilot system will be mass-produced in 2020;

Baidu announced at the end of 2015 that its self-driving car plans to be mass-produced for five years in commercial use. BYD has deepened cooperation with Baidu to jointly develop driverless cars.

It is foreseeable that in the near future, as technology continues to develop and improve, autonomous driving technology will enter a practical stage, spread to thousands of households, people can travel freely without worrying about human driving accidents, such as unlicensed driving, speeding, fatigue driving Traffic accidents caused by humans such as drunk driving. Therefore, autonomous driving technology has broad application prospects.

Multi-level structure saliency target detection method based on RCNN

1 Autopilot technology

Autopilot technology is divided into traditional features based and deep learning driving techniques.

In the existing automatic driving based on traditional features, target recognition is one of the core tasks, including road and road edge identification, lane line detection, vehicle identification, vehicle type identification, non-motor vehicle identification, pedestrian identification, traffic sign recognition. , obstacle identification and avoidance, and so on. The target recognition system uses computer vision to observe the traffic environment, automatically recognizes the target from the real-time video signal, and provides a basis for real-time automatic driving, such as start, stop, steering, acceleration, and deceleration.

Due to the extremely complicated actual road conditions, the performance of assisted driving technology based on traditional target detection is difficult to be greatly improved. The existing automatic driving technology generally relies on advanced radar systems to compensate, which significantly increases the cost of system implementation. With the development of technology, ConvoluTIonal Neural Networks (CNN) can directly learn and perceive vehicles on roads and roads. After a period of correct driving, you can learn and perceive relevant driving knowledge under actual road conditions. There is no need to perceive specific road conditions and various targets, which greatly improves the performance of the assisted driving algorithm.

2 Automated driving technology based on traditional features

Conventional features in autopilot technology refer to features such as HOG (gradient histogram) features, SIFF (scale invariant feature transform) features, and CSS (color self-similarity).

Currently, mainstream autonomous driving techniques are based on video analytics. The video sequence captured in the traffic scene contains various video targets, such as pedestrians, cars, roads, obstacles, various objects in the background, etc., and the target object of the interest category needs to be identified in the test image, Provided to the vehicle control system as a basis for decision making.

The detection and representation of features is a key step, which involves how to encode the problem of describing the target image information. The ideal feature representation method should be able to adapt to the influence of various interference factors, such as scale, appearance, occlusion, complex background and so on.

2.1 Road and lane recognition

Road and lane recognition are fundamental to autonomous driving techniques, as discussed in the Caltech lane detector. The common road recognition algorithm is based on image features. It analyzes the grayscale, color, texture and other features of the lane line or road boundary, and uses neural network, support vector machine, cluster analysis and region growth. The road surface area can be divided. This type of method is very robust to changes in road curvature.

Recent advances in conditional random field based road detection methods have made significant progress. Due to the wide variety of roads and edges, the occlusion of vehicles and roadside debris, the shadows of trees and buildings, etc., there is room for further improvement in the most basic road detection.

2.2 Vehicle Detection Technology

Vehicle detection technology is one of the hotspots in the field of automatic driving research. The forward vehicle collision warning system is a technology that effectively reduces the incidence of active accidents. It is widely implemented by means of vehicle positioning, and can utilize the image characteristics of the vehicle itself, such as shadows, symmetry, edges, etc., such as common bottom shadows and The U-shaped features of the two longitudinal edges of the vehicle, etc., quickly locate the area of ​​interest of the vehicle, and then use the multi-target tracking algorithm to track the detected vehicle.

2.3 Pedestrian Detection and Collision Avoidance System

The pedestrian detection and anti-collision system for the purpose of "Pedestrian Protection" has also become a research hotspot in the field of automatic driving. At present, statistical learning methods are the most widely used in pedestrian detection. Feature extraction and classification and positioning are two key issues based on statistical learning methods.

Pedestrian detection based on statistical learning mainly includes detection methods based on generative model (local) and detection algorithms based on feature classification (overall):

The detection method based on the generative model usually uses local features or limb models to describe local attributes, and combines the spatial structure characteristics or distribution models of local features for classification.

The purpose of the feature classification based detection method is to find a way to describe the pedestrian characteristics well. By extracting the grayscale, edge, texture, color and other information of pedestrians, a pedestrian detection classifier is constructed according to a large number of samples, and different changes of the human body are learned from the sample set, and the pedestrian target in the video image is segmented from the background and accurately positioned.

In 2005, Dalal proposed that the Histogram of Oriented Gradient (HOG) is a basic feature and has very strong robustness. Many other pedestrian detection algorithms are based on the use of HOG, plus other features. Such as Scale-invariant Feature Transform (SIFT), Local Binary Pattern (LBP), Color Self-Similarity (CSS), multi-channel, and so on.

Cheng et al. observed that objects have closed edges. Based on the HOG feature, a binary normalized gradient feature (BING) is proposed to predict the significance window. The method runs very fast and can reach 300 fps. Zhao Yong et al. proposed a good scale-invariant eHOG based on HOG. The features of each bin in the gradient histogram in HOG are reconstructed into a bit plane, and then the HOG characteristics are calculated. Experiments show that the correct rate is 3-6 percentage points higher than the original HOG when the amount of calculation is not greatly increased. There is a problem with the HOG feature, that is, the entire HOG feature is elongated into a vector, which weakens the local correlation between the gradient features of the local space in the two-dimensional plane.

The I-HOG proposed by Zhang Yongjun et al. uses the multi-scale feature extraction algorithm and the construction of the gradient histogram to enhance the local correlation of pedestrian edge information in the two-dimensional plane space. The I-HOG features are compared with the original HOG features. Significantly improved detection rate. SIFT is an algorithm for detecting local features. It obtains features by characterizing a feature point and its associated scale and direction in a picture and performs image feature point matching for retrieval or standard library category identification. Not only has the scale invariance, even if the rotation angle is changed, the image brightness or the shooting angle can be very well detected.

3 Automated driving technology based on deep learning

The target detection and recognition technology based on video analysis has undergone a transition process from traditional features such as HOG, SIFT, Bag of visual words and Fisher kernel vector to deep learning.

The description obtained by HOG maintains the geometric and optical transformation invariance of the image. The Fisher kernel vector can unify the dimensions of various features, and the precision loss during compression is small. These traditional intuitive features have achieved good results at the current stage. However, due to the variety of targets, large changes, and changes in perspectives, traditional feature-based target detection encounters bottlenecks that are difficult to surpass.

In recent years, the rise of deep learning has made the performance of target detection and recognition in a large number of multi-states and multi-states greatly improved to the level of anthropomorphism, and even surpassed humans in many aspects. Deep learning features are features that are automatically learned from a large amount of training data, and are more capable of characterizing the target than traditional features.

Deep learning has several common model frameworks, such as automatic encoders, sparse coding, Boltzmann machines, deep confidence networks, and convolutional neural networks. Among them, the deep learning model based on ConvoluTIon Neural Network (CNN) is one of the most commonly used models and research hotspots.

In the 1960s, when Hubel and Wiesel studied the local sensitive and directional selection of neurons in the cat's cerebral cortex, they found that their unique network structure can effectively reduce the complexity of the feedback neural network, and then proposed CNN. The new identification machine proposed by K.Fukushima in 1980 is CNN's first implementation network. Subsequently, the target detection is learned and detected through the scanning window, which greatly improves the efficiency of multi-class detection target recognition. The most representative is the work of Hinton, the originator of deep learning. The author trained deep neural networks to classify more than 1,000 kinds of 1.2 million images of LSVRC-2010 and LSVRC-2012, and obtained the highest detection rate at that time. The main disadvantage of this scanning window-based method is that the combination of the size and position of the scanning window is too large, resulting in an excessive calculation amount and difficulty in realizing it.

The CNN idea has been continuously improved in recent years, and its accuracy and computational efficiency have been greatly improved. In 2014, Girshick et al. proposed R-CNNL, which is to divide each picture into about 2000 regional input CNN training, extract fixed-length features from CNN in the pre-selection box, and finally pass the specific category of support vector machine ( SVM) to classify. Since each candidate area needs to be sent to Alexnet for detection, resulting in a slow detection speed, He Yanming et al. proposed SPPnet. SPPnet changed the way it used to crop an image to fit the Alexnet input requirements, but instead used any size image as input.

Based on SPPnet, Fast-RCNN uses the significance detection method to extract the pre-selected regions on the original image, and maps each region coordinate to a specific map. When performing target detection, the ROI pooling layer is used to select the coordinates of the mapping. The area, the partial convolution image is sent to the classifier, and there is no need to convolute each pre-selected area, which greatly improves the detection speed.

In 2015, Ren et al. proposed Faster-RCNN, which used an RPN network on the previous basis, and used a convolution operation to obtain a convolutional feature image once. The Faster-RCNN is a further acceleration of the Fast-RCNN. At the ICCV International Conference in December 2015, Dr. Zou Wenbin proposed a multi-level structure-based target detection method based on RCNN based on R-CNN.

Cute Fan

Dongguan Deli Plastic Co.,Ltd is a manufacturer specialized in the research, development ,plastic injection mould and making mass production with well-equipped facilities and strong technical force.

Our products are extensively used in household industry/electronic industry/automobile industry/building industry and other industries.


We have rich experience on one-stop solution, provide various services from new product design,prototype,mold making,mass production,assembly and logistics. The most important advantage is we have our own R&D team to help clients to turn ideas into actual parts. All of these engineers and designers have over 15 years experience in these plastic products fields.

We have a strict quality control system, an excellent management team and also a dedicated sales force, enable us to fulfill our commitment in high quality products and outstanding services.
If you are looking for a trustworthy supplier of customized items, please do not hesitate to contact us. We are always striving to establish a win-win partnership with customers from all over the world and help our partners to stay one step in front of your competitors.

Cute Fan,Cute Usb Fan,Cute Mini Handy Fan,Desktop Cute Small Fan

Dongguan Yuhua Electronic Plastic Technology Co.,Ltd , https://www.yuhuaportablefan.com