Best of arXiv.org for AI, Machine Learning, and Deep Learning – August 2020

Print Friendly, PDF & Email

In this recurring monthly feature, we filter recent research papers appearing on the arXiv.org preprint server for compelling subjects relating to AI, machine learning and deep learning – from disciplines including statistics, mathematics and computer science – and provide you with a useful “best of” list for the past month. Researchers from all over the world contribute to this repository as a prelude to the peer review process for publication in traditional journals. arXiv contains a veritable treasure trove of statistical learning methods you may use one day in the solution of data science problems. The articles listed below represent a small fraction of all articles appearing on the preprint server. They are listed in no particular order with a link to each paper along with a brief overview. Links to GitHub repos are provided when available. Especially relevant articles are marked with a “thumbs up” icon. Consider that these are academic research papers, typically geared toward graduate students, post docs, and seasoned professionals. They generally contain a high degree of mathematics so be prepared. Enjoy!

MusPy: A Toolkit for Symbolic Music Generation

This paper presents MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including data set management, data I/O, data pre-processing and model evaluation. In order to showcase its potential, statistical analysis of the eleven data sets currently supported by MusPy is presented. Moreover, a cross-data set generalizability experiment is conducted by training an autoregressive model on each data set and measuring held-out likelihood on the others—a process which is made easier by MusPy’s data set management system. The results provide a map of domain overlap between various commonly used data sets and show that some data sets contain more representative cross-genre samples than others. Along with the data set analysis, these results might serve as a guide for choosing data sets in future research. Source code and documentation are available HERE.

Context-aware Feature Generation for Zero-shot Semantic Segmentation

Existing semantic segmentation models heavily rely on dense pixel-wise annotations. To reduce the annotation pressure, this paper focuses on a challenging task named zero-shot semantic segmentation, which aims to segment unseen objects with zero annotations. This task can be accomplished by transferring knowledge across categories via semantic word embeddings. This paper proposes a novel context-aware feature generation method for zero-shot segmentation named CaGNet. In particular, with the observation that a pixel-wise feature highly depends on its contextual information, the technique inserts a contextual module in a segmentation network to capture the pixel-wise contextual information, which guides the process of generating more diverse and context-aware features from semantic word embeddings. The method achieves state-of-the-art results on three benchmark data sets for zero-shot segmentation. Code associated with the paper is available HERE.

A Federated Approach for Fine-Grained Classification of Fashion Apparel

As online retail services proliferate and are pervasive in modern lives, applications for classifying fashion apparel features from image data are becoming more indispensable. Online retailers, from leading companies to start-ups, can leverage such applications in order to increase profit margin and enhance the consumer experience. Many notable schemes have been proposed to classify fashion items, however, the majority of which focused upon classifying basic-level categories, such as T-shirts, pants, skirts, shoes, bags, and so forth. In contrast to most prior efforts, this paper aims to enable an in-depth classification of fashion item attributes within the same category. Beginning with a single dress, the goal is to classify the type of dress hem, the hem length, and the sleeve length. The proposed scheme is comprised of three major stages: (a) localization of a target item from an input image using semantic segmentation, (b) detection of human key points (e.g., point of shoulder) using a pre-trained CNN and a bounding box, and (c) three phases to classify the attributes using a combination of algorithmic approaches and deep neural networks. The experimental results demonstrate that the proposed scheme is highly effective, with all categories having average precision of above 93.02 and outperforms existing Convolutional Neural Networks (CNNs)-based schemes.

ATG-PVD: Ticketing Parking Violations on A Drone

This paper introduces a novel suspect-and-investigate framework, which can be easily embedded in a drone for automated parking violation detection (PVD). The proposed framework consists of: 1) SwiftFlow, an efficient and accurate convolutional neural network (CNN) for unsupervised optical flow estimation; 2) Flow-RCNN, a flow-guided CNN for car detection and classification; and 3) an illegally parked car (IPC) candidate investigation module developed based on visual SLAM. The proposed framework was successfully embedded in a drone from ATG Robotics. The experimental results demonstrate that, firstly, the proposed SwiftFlow outperforms all other state-of-the-art unsupervised optical flow estimation approaches in terms of both speed and accuracy; secondly, IPC candidates can be effectively and efficiently detected by the proposed Flow-RCNN, with a better performance than our baseline network, Faster-RCNN; finally, the actual IPCs can be successfully verified by the investigation module after drone re-localization.

A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. This paper proposes a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. The approach is implemented in a tool called SPECTRL, which is shown to outperform several state-of-the-art baselines.

DiverseNet: When One Right Answer is not Enough

Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. This paper introduces a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. The best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. The method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.

Masked Face Recognition for Secure Authentication

With the recent world-wide COVID-19 pandemic, using face masks have become an important part of our lives. People are encouraged to cover their faces when in public area to avoid the spread of infection. The use of these face masks has raised a serious question on the accuracy of the facial recognition system used for tracking school/office attendance and to unlock phones. Many organizations use facial recognition as a means of authentication and have already developed the necessary datasets in-house to be able to deploy such a system. Unfortunately, masked faces make it difficult to be detected and recognized, thereby threatening to make the in-house datasets invalid and making such facial recognition systems inoperable. This paper addresses a methodology to use the current facial datasets by augmenting it with tools that enable masked faces to be recognized with low false-positive rates and high overall accuracy, without requiring the user data set to be recreated by taking new pictures for authentication. An open-source tool is presented, MaskTheFace to mask faces effectively creating a large data set of masked faces.

Periodic Stochastic Gradient Descent with Momentum for Decentralized Training

Decentralized training has been actively studied in recent years. Although a wide variety of methods have been proposed, yet the decentralized momentum SGD method is still underexplored. This paper proposes a novel periodic decentralized momentum SGD method, which employs the momentum schema and periodic communication for decentralized training. With these two strategies, as well as the topology of the decentralized training system, the theoretical convergence analysis of our proposed method is difficult. The research addresses this challenging problem and provides the condition under which our proposed method can achieve the linear speedup regarding the number of workers.

Adaptive Serverless Learning

With the emergence of distributed data, training machine learning models in the serverless manner has attracted increasing attention in recent years. Numerous training approaches have been proposed in this regime, such as decentralized SGD. However, all existing decentralized algorithms only focus on standard SGD. It might not be suitable for some applications, such as deep factorization machine in which the feature is highly sparse and categorical so that the adaptive training algorithm is needed. This paper proposes a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. To the best of the researcher’s knowledge, this is the first adaptive decentralized training approach. The theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.

How Do the Hearts of Deep Fakes Beat? Deep Fake Source Detection via Interpreting Residuals with Biological Signals

Fake portrait video generation techniques have been posing a new threat to the society with photorealistic deep fakes for political propaganda, celebrity imitation, forged evidences, and other identity related manipulations. Following these generation techniques, some detection approaches have also been proved useful due to their high classification accuracy. Nevertheless, almost no effort was spent to track down the source of deep fakes. This paper proposes an approach not only to separate deep fakes from real videos, but also to discover the specific generative model behind a deep fake. Some pure deep learning based approaches try to classify deep fakes using CNNs where they actually learn the residuals of the generator. It is believed that these residuals contain more information and it can reveal these manipulation artifacts by disentangling them with biological signals. The key observation yields that the spatiotemporal patterns in biological signals can be conceived as a representative projection of residuals. To justify this observation, PPG cells are extracted from real and fake videos and fed to a state-of-the-art classification network for detecting the generative model per video.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*