# Xin DU
Assistant Professor,
School of Fundamental Science and Engineering
Waseda University, Tokyo, Japan
I work with Professor Kumiko Tanaka-Ishii on complex-systems approaches to understanding natural language and large language models (LLMs). My research focuses on developing fundamentally new ways to **define, detect, and quantify complex, macroscopic behaviors in LLMs**—including phenomena such as **hallucination**, **structural degradation**, and **mode collapse in autoregressive generation**.
These behaviors *cannot* be captured by existing local metrics. Instead, they require tools from **fractal geometry, self-similarity**, **scale-free structures**, **long-range dependencies**, **emergence**, and **dynamical-systems theory**. My work introduces complexity-theoretic and information-geometric analyses to uncover the hidden structure of language and model dynamics.
Beyond fundamental research, I am also interested in the role of generative language models in practical applications, including **document clustering**, **generative information retrieval**, and **financial complex systems**.
# Selected Publications
**Language and Complexity**
- Xin Du and Kumiko Tanaka-Ishii. Correlation Dimension of Autoregressive Large Language Models. _NeurIPS 2025_ [\[arXiv\]](https://arxiv.org/abs/2510.21258)
- Xin Du and Kumiko Tanaka-Ishii. Correlation Dimension of Natural Language in A Statistical Manifold. _Physical Review Research. 2024_ [\[Site\]](https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.6.L022028) [\[arXiv\]](https://arxiv.org/abs/2405.06321)
- Xin Du and Kumiko Tanaka-Ishii. FIRE: Semantic Field of Words Represented as Nonlinear Functions. _NeurIPS 2022_ [\[Site\]](https://proceedings.neurips.cc/paper_files/paper/2022/hash/f08223bc8d177df6807811c32f5acfed-Abstract-Conference.html)
**Retrieval, Clustering**
- Xin Du and Kumiko Tanaka-Ishii. Information-Theoretic Generative Clustering of Documents. _AAAI 2025_ [\[Site\]](https://ojs.aaai.org/index.php/AAAI/article/view/33802) [\[arXiv\]](https://arxiv.org/abs/2412.13534)
- Xin Du, Lixin Xiu, and Kumiko Tanaka-Ishii. Bottleneck-Minimal Indexing for Generative Document Retrieval. _ICML 2024 (Oral)_ [\[Site\]](https://dl.acm.org/doi/abs/10.5555/3692070.3692542) [\[arXiv\]](https://arxiv.org/abs/2405.10974)
**Finance and Language**
- Xin Du and Kumiko Tanaka-Ishii. Stock embeddings acquired from news articles and price history, and an application to portfolio optimization. _ACL 2020_ [\[Site\]](https://aclanthology.org/2020.acl-main.307/)
- Xin Du and Kumiko Tanaka-Ishii. Stock portfolio selection balancing variance and tail risk via stock vector representation acquired from price data and texts. _Knowledge-Based Systems_ [\[Site\]](https://www.sciencedirect.com/science/article/abs/pii/S0950705122004397)
# Projects
I am currently leading one project supported by JSPS. Starting from 2026, I will lead another project supported by NSFC.
- **Complex-Systems Approaches to Large Language Models (2026–2028)**
Supported by the National Natural Science Foundation of China.
- **Generative Information Retrieval and Indexing with Large Language Models (2025–2027)**
Supported by JSPS KAKENHI (Grant-in-Aid for Early-Career Scientists).
# Links
[\[Lab homepage\]](https://ml-waseda.jp )
[\[Google Scholar\]](https://scholar.google.com/citations?user=u-ObqvUAAAAJ&hl=ja)