# Xin DU Assistant Professor, School of Fundamental Science and Engineering Waseda University, Tokyo, Japan I work with Professor Kumiko Tanaka-Ishii on complex-systems approaches to understanding natural language and large language models (LLMs). My research focuses on developing fundamentally new ways to **define, detect, and quantify complex behaviors in LLMs**—including phenomena such as **hallucination**, **structural degradation**, and **sudden collapse in autoregressive generation**. These behaviors cannot be captured by traditional, reductionist metrics. Instead, they require tools from **fractal geometry, self-similarity**, **scale-free structures**, **long-range dependencies**, **emergence**, and **dynamical-systems theory**. My work introduces complexity-theoretic and information-geometric analyses to uncover the hidden structure of language and model dynamics. Beyond fundamental research, I am also interested in the role of generative language models in practical applications, including **document clustering**, **generative information retrieval**, and **financial complex systems**. # Selected Publications **Language and Complexity** - Xin Du and Kumiko Tanaka-Ishii. Correlation Dimension of Autoregressive Large Language Models. _NeurIPS 2025_ [\[arXiv\]](https://arxiv.org/abs/2510.21258) - Xin Du and Kumiko Tanaka-Ishii. Correlation Dimension of Natural Language in A Statistical Manifold. _Physical Review Research. 2024_ [\[Site\]](https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.6.L022028) [\[arXiv\]](https://arxiv.org/abs/2405.06321) - Xin Du and Kumiko Tanaka-Ishii. FIRE: Semantic Field of Words Represented as Nonlinear Functions. _NeurIPS 2022_ [\[Site\]](https://proceedings.neurips.cc/paper_files/paper/2022/hash/f08223bc8d177df6807811c32f5acfed-Abstract-Conference.html) **Retrieval, Clustering** - Xin Du and Kumiko Tanaka-Ishii. Information-Theoretic Generative Clustering of Documents. _AAAI 2025_ [\[Site\]](https://ojs.aaai.org/index.php/AAAI/article/view/33802) [\[arXiv\]](https://arxiv.org/abs/2412.13534) - Xin Du, Lixin Xiu, and Kumiko Tanaka-Ishii. Bottleneck-Minimal Indexing for Generative Document Retrieval. _ICML 2024 (Oral)_ [\[Site\]](https://dl.acm.org/doi/abs/10.5555/3692070.3692542) [\[arXiv\]](https://arxiv.org/abs/2405.10974) **Finance and Language** - Xin Du and Kumiko Tanaka-Ishii. Stock embeddings acquired from news articles and price history, and an application to portfolio optimization. _ACL 2020_ [\[Site\]](https://aclanthology.org/2020.acl-main.307/) - Xin Du and Kumiko Tanaka-Ishii. Stock portfolio selection balancing variance and tail risk via stock vector representation acquired from price data and texts. _Knowledge-Based Systems_ [\[Site\]](https://www.sciencedirect.com/science/article/abs/pii/S0950705122004397) # Projects I am currently leading one project supported by JSPS. Starting from 2026, I will lead another project supported by NSFC. - **Complex-Systems Approaches to Large Language Models (2026–2028)** Supported by the National Natural Science Foundation of China. - **Generative Information Retrieval and Indexing with Large Language Models (2025–2027)** Supported by JSPS KAKENHI (Grant-in-Aid for Early-Career Scientists). # Education and Appointments - Assistant Professor, Department of Computer Science and Communication Engineering, Waseda University. 2025 - Now - Assistant Professor, Waseda Research Institute for Science and Engineering, Waseda University. 2023 - 2024 - JSPS Young Scientist Fellowship, Japan. 2021 - 2023 - Ph.D. in Advanced Interdisciplinary Studies, The University of Tokyo, Japan. 2020 - 2023 - M.S. in Mathematical Informatics, The University of Tokyo, Japan. 2018 - 2020 - B.S. in Industrial Engineering, Tongji University, China. 2012 - 2017 # Links [\[Lab homepage\]](https://ml-waseda.jp ) [\[Google Scholar\]](https://scholar.google.com/citations?user=u-ObqvUAAAAJ&hl=ja)