• جعفر طهمورث نژاد

  • دانشیار
  • گروه مهندسی فناوری اطلاعات
Email:   
Naimeh Alipour

Heterogeneous Domain Adaptation via Landmark Selection and Sparse Coding



2021, ,

Abstract

Machine learning and data mining approaches have gained considerable success in numerous applications including object detection, text categorization, and information retrieval. The fundamental problem of these applications is that the majority of time, there is not adequate labeled data to train the classification model, which reduces the model generalization to new instances. As a result, domain adaptation strategies are deployed to leverage the information from the source domain to assist the target domain's learning tasks. However, most existing domain adaptation methods are based on a homogenous setting where the source and target domain data are sampled from the same feature space with different distributions. In fact, in real-world scenarios, it is not easy to find a source domain that drawn exactly from the same feature space as the target domain. Thus, the heterogeneous domain adaptation is proposed to transfer knowledge across domains that have either distinct feature spaces and data distributions. In this paper, we focus on semi- supervised heterogeneous domain adaptation where a limited number of labeled target instances are available during the learning process. To this end, two novel approaches proposed entitled sparse representation and landmark selection (SRLS) and statistical distribution alignment and progressive pseudo label selection (SDA-PPLS). The first proposed method (SRLS) finds a new shared feature representation of instances via sparse coding to solve the discrepancy among feature spaces. Afterward, minimizes the marginal and conditional distribution divergence in the shared feature space through an instance-based method. Moreover, SRLS preserves the geometrical information of instances during the distribution alignment process in order to have a discriminative representation of features. The second proposed method (SDA-PPLS) finds a new shared feature representation through learning two transformation matrices of source and target domains. Moreover, to mitigate the distribution gap in this new feature space, SDA-PPLS aligns both first-order and second-order statistical information simultaneously, to improve the target classification model performance. In addition, to discriminate instances into distinct classes, SDA-PPLS mitigates the class conditional distributions by the means of target pseudo labels. These pseudo labels are predicted by structural information and neighborhood of instances in source and target domains. Finally, to prevent the propagation of inaccurate pseudo labels to the next iteration, a progressive technique is used to select instances with higher probability. Experimental results on several real-world datasets on image to image, text to text, and text to image tasks, demonstrate that the proposed method outperforms other state-of-the-art HDA methods.

Key Words : Heterogeneous domain adaptation, transfer learning, distribution alignment, cross lingual text categorization, object detection.




---