Depression is characterized by high prevalence, high recurrence, high disability and high mortality, which seriously affects people’s work and life. Among various behavioral biomarkers, speech-based features have gained increasing attention in depression detection due to their non-invasive nature, affordability, and rich capacity for conveying affective states. However, conventional depression recognition approaches rely solely on unimodal acoustic representations and largely overlook the influe