Evaluating Aging-in-Place Status of Chinese Older Adults with China Health and Retirement Longitudinal Study (CHARLS) Data
UP213 Urban Data Science Final Project
Chendi Zhang

Project Background and Research Questions
As global life expectancy rises and fertility declines, aging populations are becoming a pressing challenge worldwide (United Nations, 2023). To address the aging crisis, Aging in Place (AIP) - the ability to live independently and safely in one’s home and community - has become central in aging policy discourse (Peng and Maing 2021; Versey, 2018). While AIP is widely endorsed for its cost-effectiveness and alignment with older adults’ heterogeneous preferences (Iecovich, 2014), research remains limited on how AIP outcomes vary spatially and what socio-demographic characteristicss impact AIP in later life, particularly in underrepresented developing countries like China.
To address the gaps, this project examines how socio-demographic characteristicss influence AIP status (Health Confidence Theme) among older adults in China. The project draws on the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative longitudinal survey of adults aged 45 and above. CHARLS includes detailed information on demographics, family structure, health, employment, income, and assets (Zhao et al., 2020). The study applys machine learning and tranditional regression method to examine the relationship between individual socio-demographic characteristics, city-level spatial characteristics and AIP status (Health Theme). This project contributes to a deeper understanding of the cumulative effects of life trajectories on AIP in a rapidly aging and spatially diverse but understudied developing country.
Research Questions:
1. Is there a spatial disparity of AIP status (Health Confidence Theme) for older adults in China?
2. What socio-demographic characteristics impact AIP status (Health Confidence Theme) for older adults in their later life?
3. Are there any spatial factors such as public facility numbers associated with higher or higher or lower city-level AIP status (Health Confidence Theme)?
Data & Method
Data:
China Health and Retirement Longitudinal Study (CHARLS) is the primary data source. [https://charls.pku.edu.cn/en/]
I also include China's city-level jurisdiction boundary shapefile [https://chinadatacenter.net/DataCategory/DataCategory.aspx?type=3] and China's city-level public facility data [https://www.webmap.cn/store.do?method=store&storeId=2].
Methods & Variables:
-
Mapping the city-level AIP status (Health Confidence Theme) for older adults in China.
-
Machine Learning (Random Forest) to examine if socio-demographic characteristics and public facility numbers impact older adults' AIP status.
-
Spearman's Rho correlation as a comparative method to examine if socio-demographic characteristics and public facility numbers impact older adults' AIP status.

City-Level AIP_Health Theme Map
Geographical disparity of AIP_Health across Chinese cities
-
Higher scores (dark red areas) are concentrated in coastal and eastern regions, such as parts of Jiangsu, Zhejiang, and Shandong. This suggests that older adults in these cities report better age-in-place health conditions.
-
Lower scores (light areas) are more common in northern and western cities, including Inner Mongolia and some parts of the northwest, indicating relatively poorer age-in-place health outcomes.
-
The spatial pattern generally aligns with China's regional development differences: cities in more economically developed coastal regions tend to show better AIP_HEALTH scores, likely due to better infrastructure, services, and social support systems.

Examine if Socio-demographic Characteristics and Public Facility Numbers Impact AIP Status (Health Theme)
Machine Learning to Examine the Relationships_Socio-demographic Characteristics



Result Interpretation
-
R² = -0.0820 means that the model fails to capture meaningful variance in AIP_HEALTH using age, education, and income alone. The variables are not strongly predictive of AIP_HEALTH, or the relationship is complex in a way the model isn’t capturing with the current training.
-
In the partial dependence plots, 1) as age increases, predicted AIP_HEALTH decreases, with a sharper drop after age ~85. This aligns with expectations around declining health; 2) higher education is associated with higher predicted health; and 3) capital income fluctuates and insdicates a complex relationship with AIP_HEALTH.
-
Accoording to the feature importance, capital income contributes the most to the model. Since capital income shows an erratic relationship with AIP_HEALTH and it weighs the most in the Random Forest regression, it explains the overall R² is not statistically significant.
Machine Learning to Examine the Relationships_Public Facility Numbers



Result Interpretation
-
R² = -0.3090 means that the model fails to capture meaningful variance in AIP_HEALTH using age, education, and income alone. The variables are not strongly predictive of AIP_HEALTH, the relationship is complex in a way the model isn’t capturing with the current training, or the sample size is too small (only 71).
-
In the partial dependence plots, all four variables have steep increases in predicted AIP_HEALTH and then stablize after a certain point. This indicates that when those types of public facility reach a cerain number, they do not impact AIP_HEALTH.
-
Accoording to the feature importance, public_phone and emergency_shelter contribute the most to the model. Public_toilet's contribution is negligible.
---------------------------------
Traditional Regression to Examine the Relationships




Result Interpretation
-
Acoording to the p values, age, education, public_toilet, public_phone, and emergency_shelter have weak to moderate positive correlations with AIP_HEALTH.
Machine Learning and Spearman's Rho Result
Interpretation, Comparision, and Reflections
Machine Learning and Spearman's Rho Result Interpretation
1. In the test between socio-deomographic characteristics and AIP_HEALTH (individual level):
-
R² = -0.0820 means that the model fails to capture meaningful variance in AIP_HEALTH using age, education, and income alone. The variables are not strongly predictive of AIP_HEALTH, or the relationship is complex in a way the model isn’t capturing with the current training.
-
In the partial dependence plots, 1) as age increases, predicted AIP_HEALTH decreases, with a sharper drop after age ~85. This aligns with expectations around declining health; 2) higher education is associated with higher predicted health; and 3) capital income fluctuates and insdicates a complex relationship with AIP_HEALTH.
-
Accoording to the feature importance, capital income contributes the most to the model. Since capital income shows an erratic relationship with AIP_HEALTH and it weighs the most in the Random Forest regression, it explains the overall R² is not statistically significant.
2. In the test between public facility numbers and AIP_HEALTH (city-level):
-
R² = -0.3090 means that the model fails to capture meaningful variance in AIP_HEALTH using age, education, and income alone. The variables are not strongly predictive of AIP_HEALTH, the relationship is complex in a way the model isn’t capturing with the current training, or the sample size is too small (only 71).
-
In the partial dependence plots, all four variables have steep increases in predicted AIP_HEALTH and then stablize after a certain point. This indicates that when those types of public facility reach a cerain number, they do not impact AIP_HEALTH.
-
Accoording to the feature importance, public_phone and emergency_shelter contribute the most to the model. Public_toilet's contribution is negligible.
3. In the traditional Spearman's Rho test (individual and county-level respectively):
-
Acoording to the p values, age, education, public_toilet, public_phone, and emergency_shelter have weak to moderate positive correlations with AIP_HEALTH.
Comparison and Reflections
1. Machine Learning (Random Forest Regression)
-
Strengths: It captures nonlinear relationships and interactions between variables. Is also provides visual tools like partial dependence plots and feature importance, which help interpret the influence of each predictor.
-
Limitations in this case: Performed poorly (negative R² values) for both individual-level and city-level models.
2. Spearman’s Rho (Correlation Test)
-
Strengths: Simple and robust method that measures monotonic relationships between two variables. It identified weak-to-moderate positive correlations between AIP_HEALTH and variables like age, education, public phones, and emergency shelters. It can also deal with small sample size.
-
Limitations in this case: Only tests pairwise relationships; it doesn’t account for interactions or control for other variables. It also cannot model complex, combined effects or provide predictions like machine learning can.
3. Reflections
-
Should I not drop rural counties when processing data for each county to increase sample size for better prediction? How to justify not dropping rural counties?
-
What will happen if I make AIP-HEALTH into binary values, for example, ≥0.6 is 1, <0.6 is 0? Will the Random Forest Regression work better?
-
Measure the density of public facility numbers or use city-level economic factors as weights rather than just absolute public facility number in the future study.
-
When interpret the Random Forest results, even if the R² is small (the model does not fit well), can I still interpret the partial importance plot based on the meanings and narratives behind the variables?
ACKNOWLEDGEMENT AND REFERENCE
Instructor:
Adam Millard-Ball
Main Reference:
Ahn, M., Kang, J., & Kwon, H. J. (2020). The concept of aging in place as intention. The Gerontologist, 60(1), 50–59. https://doi.org/10.1093/geront/gnz120
Iecovich, E. (2014). Aging in place: From theory to practice. Anthropological Notebooks, 20(1), 21–33.
Lei, X., Hu, Y., McMurray, J., & Smith, J. P. (2012). Gender patterns of health insurance coverage in China. China Economic Review, 23(1), 114–129. https://doi.org/10.1016/j.chieco.2011.08.006
Peng, S., & Maing, M. (2021). Influential factors of age-friendly neighborhood open space under high-density high-rise housing context in hot weather: A case study of public housing in Hong Kong. Cities, 115, 103231. https://doi.org/10.1016/j.cities.2021.103231
United Nations. (2023). Aging. https://www.un.org/en/global-issues/aging
Versey, H. S. (2018). A tale of two Harlems: Gentrification, social capital, and implications for aging in place. Social Science & Medicine, 214, 1–11. https://doi.org/10.1016/j.socscimed.2018.07.024
Wang, S., & Chen, X. (2022). Informal caregiving and mental health among older adults in China: Evidence from CHARLS. BMC Geriatrics, 22, 372. https://doi.org/10.1186/s12877- 022-03099-3
Zhang, Z., & Wu, Y. (2020). Early-life conditions and cognitive functioning among middle-aged and older adults in China: The moderating role of age. The Journals of Gerontology: Series B, 75(9), 1996–2007. https://doi.org/10.1093/geronb/gbaa039
Zhao, Y., Strauss, J., Chen, X., Wang, Y., Gong, J., Meng, Q., Wang, G., & Wang, H. (2020).
China Health and Retirement Longitudinal Study Wave 4 User’s Guide. National School of Development, Peking University. https://charls.charlsdata.com/pages/data/111/en.html
Zhao, Y., Strauss, J., Chen, X., Wang, Y., Gong, J., Meng, Q., Wang, G., & Wang, H. (2014). China Health and Retirement Longitudinal Study—CHARLS: Introduction. International Journal of Epidemiology, 43(1), 61–68. https://doi.org/10.1093/ije/dyt203