Shiqi Yang/杨, 诗琪

Shiqi Yang

Ph.D.

shiqi.yang147.jp@gmail.com

Tokyo, Japan

Chief Research Scientist/Manager

SB Intuitions, SoftBank, Tokyo, Japan

From 2024.12, I am a chief research scientist and leader/manager of Creative Vision team at SB Intuitions, Tokyo, and I am also affiliated with SoftBank Corp. From 2023.10 to 2024.11, I worked as an audio-visual research scientist in Sony Group Corporation, Tokyo. Before that, I was a Ph.D. student in Learning and Machine Perception (LAMP) team (2019.10 ~ 2023.7), advised by Dr. Joost van de Weijer in Computer Vision Center, Autonomous University of Barcelona, Spain.

Looking for researcher and engineer based in Tokyo, where you can build foundation model from scratch

#Multi-Modal Learning #Transfer Learning

Currently, I am leading industrial projects in visual generation, unified visual manipulation model (towards multimodal generation).
I was working on multi-modal (especially audio-visual) generation and model adaptation.
During PhD, I focus on how to efficiently adapt the pretrained model to real world environment under domain and category shift unsupervisedly, where the related research topics cover zero-shot learning, source-free/test-time/continual/open-set domain adaptation.

CV (updated May. 2025)

News

[2025.5] We will host 2nd workshop on Audio-Visual Generation and Learning (AVGenL) in ICCV 2025, we will have 2 industrial sessions this year: Hedra and Veo 3 from Google DeepMind. Stay tuned for more details.
[2025.2] "One-way ticket" is accepted by CVPR 2025.
[2025.1] "Mine Your Own Secrets", "InternLCM" and "One-Prompt-One-Story" (spotlight) are accepted by ICLR 2025.
[2024.10] Have visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento.
[2024.9] Our paper Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models is accepted by NeurIPS 2024.
[2024.4] We are organizing an ECCV 2024 workshop AVGenL: Audio-Visual Generation and Learning, please check the site for CfP and speakers.
[2023.12] My doctoral thesis received Pioneer Awards 2023 - CERCA.
[2023.9] Our paper 'Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing' is accepted by NeurIPS 2023.
[2023.8] Extended version of NRC is accepted by IEEE TPAMI.
[2023.6] 'Casting a BAIT for Offline and Online Source-free Domain Adaptation' is finally accepted, by CVIU.
[2023.1] Have a visiting talk in Prof. Maria Brbic's group in EPFL.
[2022.11] I present our work on model adaptation under domain and category shift on TrustML Young Scientist Seminars (hosted by RIKEN AIP) on Dec .7.
[2022.9] 'Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation' is accepted by NeurIPS 2022 as Spotlight, and our paper 'Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification' is accepted by BMVC 2022.
[2021.9] 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation' is accepted by NeurIPS 2021.
[2021.7] 'Generalized Source-free Domain Adaptation' is accepted by ICCV 2021.

Full Publications

Journal

Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering

Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 [paper][arxiv]

Casting a BAIT for Offline and Online Source-free Domain Adaptation

Shiqi Yang, Yaxing Wang, Luis Herranz, Shangling Jui, Joost van de Weijer

Computer Vision and Image Understanding (CVIU), 2023 [paper][arxiv][code]

On Implicit Attribute Localization for Generalized Zero-Shot Learning

Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer

IEEE Signal Processing Letters, 2021 [paper][arXiv]

Preprint

OpenMU: Your Swiss Army Knife for Music Understanding

Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji

preprint, 2024 [arxiv][code]

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass

preprint, 2024 [arxiv]

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

preprint, 2024 [arxiv][demo]

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang

preprint, 2023 [arxiv]

A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku

preprint, 2023 [arXiv ]

OneRing: A Simple Method for Source-free Open-partial Domain Adaptation

Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer

preprint, 2022 [project][arXiv ][code]

International Conference

One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, jian Yang

The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2025 [arxiv]

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji

International Conference on Learning Representations (ICLR) 2025 [arxiv] [openreview] [project]

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng

International Conference on Learning Representations (ICLR) 2025 Spotlight [arxiv] [openreview][project]

InternLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration

Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, jian Yang, Ming-Ming Cheng

International Conference on Learning Representations (ICLR) 2025 [arxiv] [openreview][project]

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

Advances in Neural Information Processing Systems (NeurIPS) 2024 [project][arxiv][code]

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

International Society for Music Information Retrieval (ISMIR) 2024 [arxiv]

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer

Advances in Neural Information Processing Systems (NeurIPS) 2023 [paper][arxiv][code]

Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification

Kai Wang, Chenshen Wu, Andrew D. Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer

British Machine Vision Conference (BMVC) 2022 [arxiv][code]

Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer

Advances in Neural Information Processing Systems (NeurIPS) 2022 Spotlight [project][paper][arXiv ][code]

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation

Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui

Advances in Neural Information Processing Systems (NeurIPS) 2021 [project][paper ][arXiv ][code]

Generalized Source-free Domain Adaptation

Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui

International Conference on Computer Vision (ICCV) 2021 [project][paper ][arXiv ][code][video]

Parallel Convolutional Networks for Image Recognition via a Discriminator

Shiqi Yang, Gang Peng

Asian Conference on Computer Vision (ACCV) 2018 [paper][arXiv]

Attention to Refine Through Multi Scales for Semantic Segmentation

Shiqi Yang, Gang Peng

Pacific-Rim Conference on Multimedia (PCM) 2018 [paper][arXiv]

Experience

Dec. 2024 ~, Chief Research Scientist and Manager in SB Intuitions, SoftBank, Tokyo, Japan
Oct. 2023 ~ Nov. 2024, Research Scientist in Sony Group Corporation, Tokyo, Japan
Jan. 2023 ~ Jun. 2023, Research Intern in OMRON SINIC X, Tokyo, Japan
Oct. 2018 ~ Mar. 2019, Guest Research Associate in Kyoto University, Japan.

Invited Talks, Awards and Activities

Visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento, Italy, 2024.10
Pioneer Awards 2023, CERCA Research Center of Catalonia, Spain, 2023.12
Visiting talk in Prof. Maria Brbic's group in EPFL, Switzerland, 2023.1
Invited talk on TrustML Young Scientist Seminars, RIKEN AIP, Japan, 2022.12
ICVSS summer school, Sicily, Italy, 2022.7
Invited talk on AI Time Seminar on NeurIPS 2021 (Virtual), China, 2022.2

Academic Service

Guest Editor: IJCV Special Issue "Audio-Visual Generation"

Organizer: ECCV 24/ICCV25 Audio-Visual Generation and Learning workshop

Conference Reviewer: ICLR; ICCV; NeurIPS; ECCV; ICML; CVPR; WACV

Journal Reviewer: IEEE TKDE, TPAMI, TAI, IJCV

Education

Oct. 2019 ~ Jul. 2023. Ph.D. in Computer Science, Computer Vision Center, Autonomous University of Barcelona, Spain
Sep. 2016 ~ Jun. 2019. Master in Control Science and Technology, Huazhong University of Science and Technology, China
Sep. 2012 ~ Jun. 2016. Bachelor in Automation, Wuhan University of Science and Technology, China

Google Sites

Report abuse