AlphaPose is a real-time multi-person pose estimation system

The MVIG laboratory of Lu Cewu's team at Shanghai Jiaotong University has recently launched an upgraded version of their previously open source real-time human pose estimation system AlphaPose. The new system uses the PyTorch framework to achieve the current highest accuracy of 71mAP on the COCO attitude estimation standard test set, and the average speed is 20FPS, which is 3 times faster than Mask-RCNN.

AlphaPose is a real-time multi-person pose estimation system.

In February of this year, the AlphaPose system of the MVIG laboratory of the Lu Cewu team of Shanghai Jiaotong University was launched. It is the first open source attitude estimation system that can reach 70+ mAP on the COCO data set. In this update, the real-time performance is a bright spot for improvement under the condition that the accuracy does not decrease.

The new system uses the PyTorch framework, and achieves an accuracy of 71mAP on the COCO validation set, the standard test set for pose estimation (Pose Estimation) (17% higher than OpenPose and 8% higher than Mask-RCNN). At the same time, the speed reaches 20FPS (compared to OpenPose has a relative increase of 66%, and Mask-RCNN has a relative increase of 300%).

Feel the speed of AlphaPose again after the upgrade

The detection accuracy remains unchanged, and the average speed is 3 times faster than Mask-RCNN

The detection of key points of the human body is very important for describing the posture of the human body and predicting the behavior of the human body. Therefore, the detection of key points of the human body is the basis of many computer vision tasks. It has broad application prospects in the fields of action classification, abnormal behavior detection, and human-computer interaction. It is a hot topic in the field of computer vision that has research value and is extremely challenging at the same time.

The AlphaPose system is constructed based on the RMPE two-step framework (ICCV 2017 paper) proposed by the MVIG group of Shanghai Jiaotong University. Compared with other open source systems, the accuracy rate is greatly improved, 17% higher than OpenPose, and 8.2 relative to Mask-RCNN. %.

After the upgrade, the performance of each open source framework on COCO-Validation, and the time measured on a single-card 1080ti GPU are as follows:


Based on the PyTorch framework, the Attention module is introduced into the human pose estimation model

The new version of AlphaPose system is built on the PyTorch framework. Thanks to the flexibility of Python, the new system is more user-friendly and easier to install and use. It also supports Linux and Windows systems to facilitate secondary development. In addition, the system supports image, video, and camera input, and calculates the posture results of multiple people online in real time.

In order to maintain accuracy while taking into account speed, the new version of AlphaPose proposes a new attitude estimation model. The skeleton network of the model uses ResNet101, and at the same time, SE-block is added as the attention module in its downsampling part. Many experiments have proved that introducing the attention module into the Pose Estimation model can improve the performance of the model, while only adding SE- to the downsampling part. Block can make attention exert better effect with less calculation.

In addition, use PixelShuffle + Conv to perform 3 times of upsampling, and output the heat map of the key points. Traditional upsampling methods use deconvolution or bilinear interpolation. The advantage of using PixelShuffle is that while increasing the resolution, the characteristic information is not lost. Compared with bilinear interpolation, the amount of calculation is low; compared with deconvolution, there will be no grid effect.

In terms of system architecture, the new version of AlphaPose uses a multi-stage pipeline working method, using multi-threaded collaboration to maximize the speed.

The current running speed of the AlphaPose system on COCO's Validation set is 20FPS (an average of 4.6 people per picture), with an accuracy of 71mAP. In crowded scenes (an average of 15 people per picture), the AlphaPose system speed can still maintain above 10FPS.

Plug-in Connecting Terminals

Plug-In Connecting Terminals,Insulated Spade Terminals,Cable Connector Double Spade Terminals,Vinyl-Insulated Locking Spade Terminals

Taixing Longyi Terminals Co.,Ltd. , https://www.longyiterminals.com