Call for Tutorials
PROGRAM
Proceedings
  • Conference Program
  • Keynote Speakers
  • Accepted Papers
  • Poster and Demo Session
  • Proceedings
  • Tutorials
  • Workshops
  • CIKM2025 AnalytiCup
CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management
 Full Citation in the ACM Digital Library
SESSION: Keynote Talks
The Geometry of Knowledge and Computational Discovery
Modern neural networks transform vast datasets into continuous embedding spaces, translating semantic relationships into geometric structures. The key to unlocking their full potential lies in making these representations interpretable. This talk presents a simple theory underlining the interpretability of the embedding space and how this principle can allow us to analyze the data, design new metrics, and model the dynamics of the system, moving beyond black-box models to data-driven interpretable insights.
AI Planning for Data Exploration
Data Exploration is an incremental process that helps users ex- press what they want through a conversation with the data. A large body of work focused on automating data exploration (e.g., to ex- plore very large galaxy data in SDSS [6, 7], to summarize large datasets [8, 9], or to explore ratings [2, 3] and search for prod- ucts [5]). Reinforcement Learning (RL) is one of the most notable approaches to automate data exploration and several solutions have been proposed. With the advent of Large Language Models and their ability to reason sequentially, it has become legitimate to ask the question: would LLMs and, more generally AI planning, outperform a customized RL policy in data exploration [1]? More specifically, would LLMs help circumvent retraining for new tasks and strike a balance between specificity and generality [4]? This talk will attempt to answer this question by reviewing RL training and policy reusability for data exploration. This talk will start with an overview of exploratory data analysis and the various uses of RL to automate this online decision-making process. Then I will introduce AI planning and the need for policy reusability in RL. The last part of the talk will discuss pressing questions on AI planning applied to data exploration, including memory management, evaluation, and responsible deployment.
Numerical Linear Algebraic Foundations for Large-Scale Unsupervised Learning
Numerical Linear Algebra provides essential foundations in many large-scale data analytic tasks. In this talk, it is illustrated that some of the powerful methods especially for unsupervised tasks such as clustering, topic modeling, community detection, embedding, and representation learning can be derived based on a framework of low rank approximation (LRA). These include the ubiquitous singular value decomposition (SVD), latent semantic indexing (LSI), principal component analysis (PCA), and the constrained LRA (CLRA)-based methods such as nonnegative matrix factorization (NMF) and its variants such as Symmetric NMF (SymNMF), and JointNMF. It is shown that all these methods can be explained using one framework which can then be further generalized into more advanced methods such as co-clustering and co-embedding for more complex situations including multi-view and multi-granularity data sets, and into semi-supervised methods incorporating prior knowledge. The presented algorithms that utilize advances in numerical linear algebra are shown to achieve scalability, efficiency, and effectiveness. Substantial experimental results on synthetic and real-life problems illustrate significant benefits of exploiting numerical linear algebra-based methods in many data analytic tasks.
SESSION: Full Research Papers
ADMP-GNN: Adaptive Depth Message Passing GNN
Graph Neural Networks (GNNs) have proven to be highly effective in various graph learning tasks. A key characteristic of GNNs is their use of a fixed number of message-passing steps for all nodes in the graph, regardless of each node's diverse computational needs and characteristics. Through empirical real-world data analysis, we demonstrate that the optimal number of message-passing layers varies for nodes with different characteristics. This finding is further supported by experiments conducted on synthetic datasets. To address this, we propose Adaptive Depth Message Passing GNN (ADMP-GNN), a novel framework that dynamically adjusts the number of message passing layers for each node, resulting in improved performance. This approach applies to any model that follows the message passing scheme. We evaluate ADMP-GNN on the node classification task and observe performance improvements over baseline GNN models. Our code is publicly available at: https://github.com/abbahaddou/ADMP-GNN
EvenOddML: Even and Odd Aggregation with Multi-Level Contrastive Learning for Bipartite Graph
Bipartite graphs, which model relationships between two distinct entity types, are common in various applications. Existing bipartite graph neural networks (GNNs) often fail to capture both local and global structures and inadequately consider the indirect and direct influences from same-type nodes, leading to suboptimal performance. To address these issues, we propose the EvenOddML model, a contrastive learning based node representation learning module for bipartite graphs. The model comprises an Even-Odd encoder (an even-N-odd aggregation module) that aggregates information from immediate neighbors as well as 2-hop neighbors, both directly and indirectly. We also introduce a novel three-level contrastive learning framework (Layer Level, Type-Global Level, and Network- Global Level) that hierarchically maximizes mutual information by integrating local and global information at various scales. We evaluate EvenOddML on recommendation and link prediction tasks, showing its effectiveness over state-of-the-art methods in bipartite graph representation learning.
Quantization Aware Matryoshka Adaptation: Leveraging Matryoshka Learning, Quantization, and Bitwise Operations for Reduced Storage and Improved Retrieval Speed
We introduce Quantization Aware Matryoshka Adaptation (QAMA), a unified framework for creating compact yet semantically rich embeddings through Matryoshka Representation Learning and multi-level quantization. Our approach learns nested embeddings that gracefully shrink to smaller dimensional subsets and leverages bitwise operations (XOR, NOT, POPCOUNT) for efficient retrieval. By augmenting transformer-based encoders with lightweight feedforward layers and specialized regularization (Matryoshka Loss, Orthogonality, Information Bottleneck, and Quantization Loss), we produce quantization-friendly representations that preserve essential information in early dimensions. We explore 0.5-bit, 1-bit, 1.5-bit, and 2-bit quantization levels, as well as a Hybrid Quantization scheme that adaptively allocates higher precision to dimensions with greater information content. Across extensive evaluations on Modern BERT (MB) and MiniLM models, 2-bit quantization and Hybrid Quant consistently recover up to 95--98% of the original full-precision (FP32) performance while reducing memory usage by over 90%. Even at low embedding dimensions (e.g., 96--192), QAMA's hierarchical training ensures that performance remains surprisingly robust, highlighting the effectiveness of our bit-level expansions and nested representation learning. Our proposed end-to-end training with quantization-aware loss yields embeddings that map cleanly into discrete levels, supporting rapid Hamming distance calculations for semantic similarity search. % using bitwise operations. Our framework QAMA offers a practical means to optimize embedding storage and retrieval speed for information retrieval systems.
Efficient Knowledge Transfer from Large to Small Language Models via Low-Overhead Query Mechanism
Small language models offer computational efficiency but often lack the performance of larger models. We introduce a novel query mechanism enabling small models to efficiently extract knowledge from large models during inference. Our approach executes the large model on a single vector prompt, significantly reducing computational overhead compared to full model execution. Using a 1B-parameter Llama 3.2 as the small model and 3B/8B Llama models as knowledge sources, we evaluate on 20 diverse benchmarks spanning reasoning, factual recall, and reading comprehension tasks. Our best method achieves substantial improvements over the small model baseline, with particularly strong gains on factual memory tasks (average +114.9% relative improvement). Notable results include improvements on TriviaQA (74.4% vs 35.4% baseline), Freebase Questions (42.5% vs 14.6%), and Natural Questions (34.9% vs 12.6%). Our approach consistently outperforms traditional fine-tuning methods while maintaining efficiency, achieving improvements of over 41% across multiple tasks while incurring only 31% additional compute over the 1B baseline.
Mixed data k-Anonymization by Consistent Maximal Association and Microaggregation
This paper addresses the challenge of anonymizing mixed data, comprising both categorical (qualitative) and numerical (continuous) variables, while preserving data utility. The inherent heterogeneity of such data complicates the use of traditional anonymization methods. To overcome this limitation, we propose a novel microaggregation-based framework for k-anonymization that integrates statistical association measures applicable to both variable types, ensuring a coherent and consistent treatment. Our approach, called Mix-R 2, relies on a unified set of core concepts grounded in analysis of variance, enabling the application of a common methodology to both categorical and numerical attributes. By leveraging these consistent association measures, the framework improves the robustness of the k-anonymization process, delivering strong privacy protection while maintaining high data utility. Numerical experiments on benchmark datasets demonstrate the effectiveness and advantages of our method, highlighting its contribution to privacy-preserving analysis of mixed-type data.
LLM-Powered Information Extraction for the Dairy Financial Domain: Tackling Data Scarcity and Ambiguity
Information extraction is a critical technology for intelligent analysis and risk assessment in the dairy financial domain. However, real-world applications face three major challenges: the complexity and diversity of entity-relation types, significant data imbalance, and ambiguity in textual expressions. Traditional methods often fail to capture rare patterns, struggle with vague mentions, and exhibit poor generalization in low-resource settings. To address these issues, we propose a novel framework that integrates large language models (LLMs) with targeted data augmentation and agent-based retrieval-augmented generation (RAG). Our approach builds on the BaiChuan2 model, which is first adapted to the dairy finance domain via secondary pretraining. We introduce a two-stage data augmentation strategy: the first stage uses ChatGPT to generate pseudo-samples for rare types, and the second stage refines model weaknesses based on prediction-guided feedback. These augmented datasets are used to fine-tune the model through prompt-based supervised learning with LoRA. To further enhance robustness, we incorporate an agent-based RAG module for completing vague or underspecified entities by retrieving external contextual knowledge. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, with the improved metric, i.e., F1+ scores, of 0.876 and 0.824 for entity recognition and relation extraction, respectively. The RAG component boosts entity completion accuracy to 0.802 while reducing retrieval latency by over 6x, showcasing both the effectiveness and practicality of our method in real-world dairy financial applications.
RELINK: Edge Activation for Closed Network Influence Maximization via Deep Reinforcement Learning
Influence Maximization aims to select a subset of elements in a social network to maximize information spread under a diffusion model. While existing work primarily focuses on selecting influential nodes, these approaches assume unrestricted message propagation-an assumption that fails in closed social networks, where content visibility is constrained and node-level activations may be infeasible. Motivated by the growing adoption of privacy-focused platforms such as Signal, Discord, Instagram, and Slack, our work addresses the following fundamental question: How can we learn effective edge activation strategies for influence maximization in closed networks? To answer this question we introduce Reinforcement Learning for Link Activation (RELINK), the first DRL framework for edge-level influence maximization in privacy-constrained networks. It models edge selection as a Markov Decision Process, where the agent learns to activate edges under budget constraints. Unlike prior node-based DRL methods, RELINK uses an edge-centric Q-learning approach that accounts for structural constraints and constrained information propagation. Our framework combines a rich node embedding pipeline with an edge-aware aggregation module. The agent is trained using an n-step Double DQN objective, guided by dense reward signals that capture marginal gains in influence spread. Extensive experiments on real-world networks show that RELINK consistently outperforms existing edge-based methods, achieving up to 15% higher influence spread and improved scalability across diverse settings.
Learning Global-Local Multi-Scale Node Embeddings with Random Walks and Landmark-Guided Optimization
Learning low-dimensional representations for nodes is fundamental to various graph learning tasks. Existing approaches often struggle to comprehensively capture the connections between local structure, contextual information, and global node positioning in the graph. This paper introduces GLOW, a novel approach for node representation learning that jointly maximizes the likelihood of preserving these crucial aspects of the graph structure. GLOW integrates 1) local structural and contextual information, captured through random walks, with 2) global node positioning, approximated using a set of selected landmark nodes. This strategy allows GLOW to learn node embeddings that are structurally aware and informative for downstream tasks. We evaluate GLOW's performance on several benchmark datasets for node classification and link prediction, and compare it to a wide range of established baseline methods. The obtained results consistently demonstrate that GLOW achieves substantial performance improvements. This highlights the advantages of integrating local and global structural learning in effectively capturing graph structure and generating highly resilient node embeddings.
Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks
Graph Neural Networks (GNNs) have emerged as the leading paradigm for learning over graph-structured data. However, their performance is limited by issues inherent to graph topology, most notably oversquashing and oversmoothing. Recent advances in graph rewiring aim to mitigate these limitations by modifying the graph topology to promote more effective information propagation. In this work, we introduce TRIGON, a novel framework that constructs enriched, non-planar triangulations by learning to select relevant triangles from multiple graph views. By jointly optimizing triangle selection and downstream classification performance, our method produces a rewired graph with markedly improved structural properties such as reduced diameter, increased spectral gap, and lower effective resistance compared to existing rewiring methods. Empirical results demonstrate that TRIGON outperforms state-of-the-art approaches on node classification tasks across a range of homophilic and heterophilic benchmarks.
Extreme Multi-Label Completion for Semantic Document Tagging with Taxonomy-Aware Parallel Learning
The objective of Extreme Multi-Label Completion (XMLCo) is to predict missing document labels drawn from a very large collection. Together with Extreme Multi-Label Classification (XMLC), XMLCo is arguably one of the most challenging document classification tasks, as the number of potential labels is generally very large compared to the number of labeled documents. The collection of labels is often structured in a taxonomy that encodes relationships between labels, and many methods have been proposed to leverage this hierarchy to improve XMLCo algorithms. In this paper, we propose a new approach to this problem: TAMLEC (Taxonomy-Aware Multi-task Learning for Extreme multi-label Completion). TAMLEC divides the problem into several Taxonomy-Aware Tasks, i.e. into specific subsets of the labels drawn from paths in the taxonomy, and trains on these tasks using a dynamic Parallel Feature sharing approach where parts of the model are shared between tasks while others are task-specific. Then, at inference time, TAMLEC uses the labels available in a document to predict missing labels, using the Weak-Semilattice structure that is naturally induced by the tasks. Our empirical evaluation on real-world datasets shows that TAMLEC substantially outperforms the state-of-the-art in XMLCo. Furthermore, additional experiments show that TAMLEC is particularly suited for few-shot settings, where new tasks or labels are introduced with only few examples after initial training.
Relation-Faceted Graph Pooling with LLM Guidance for Dynamic Span-Aware Information Extraction
Joint information extraction aims to convert unstructured text into structured knowledge by identifying entities and their relations. However, existing methods often rely on static span formation and relation-agnostic validation, limiting their ability to capture dynamic, context-sensitive semantics. We present RePooL, a hierarchical validation framework that performs fine-grained token-level filtering followed by coarse-grained span-level validation, enabling robust multi-granular semantic modeling. RePooL constructs a dual-view knowledge graph that models tokens and relations as distinct node types. It leverages auxiliary structural relations to encode token-relation semantic compatibility via subject and object roles and to compose multi-token spans dynamically, thereby enabling relation-aware validation across multiple granularities. To further strengthen semantic grounding, RePooL incorporates LLM-guided alignment, which evaluates candidate triples against the input text to specifically reinforce coherent extractions. Extensive experiments on standard IE benchmarks show that RePooL achieves superior performance, demonstrating its effectiveness in modeling fine-grained entity-relation interactions.
MUFFIN: Mixture of User-Adaptive Frequency Filtering for Sequential Recommendation
Sequential recommendation (SR) aims to predict users' subsequent interactions by modeling their sequential behaviors. Recent studies have explored frequency domain analysis, which effectively models periodic patterns in user sequences. However, existing frequency-domain SR models still face two major drawbacks: (i) limited frequency band coverage, often missing critical behavioral patterns in a specific frequency range, and (ii) lack of personalized frequency filtering, as they apply an identical filter for all users regardless of their distinct frequency characteristics. To address these challenges, we propose a novel frequency-domain model, Mixture of User-adaptive Frequency FIlteriNg (MUFFIN ), operating through two complementary modules. (i) The global filtering module (GFM) handles the entire frequency spectrum to capture comprehensive behavioral patterns. (ii) The local filtering module (LFM) selectively emphasizes important frequency bands without excluding information from other ranges. (iii) In both modules, the user-adaptive filter (UAF) is adopted to generate user-specific frequency filters tailored to individual unique characteristics. Finally, by aggregating both modules, MUFFIN captures diverse user behavioral patterns across the full frequency spectrum. Extensive experiments show that MUFFIN consistently outperforms state-of-the-art frequency-domain SR models over five benchmark datasets. The source code is available at https://github.com/ilwoong100/MUFFIN.
Subclass-Aware Inclusive Classifier via Repulsive Hidden Strata
Classification models in machine learning are typically trained using coarse-grained class labels. Although these models often achieve strong overall accuracy, their performance is asymmetric across the subclasses that arise out of a common phenomenon called hidden stratification. Generally, the latent subclasses within each class differ substantially in distribution and characteristics, resulting in poor generalization for underrepresented groups. Moreover, imbalanced subclass distributions lead to majority subclasses dominating training, resulting in biased and less reliable models, especially for safety-critical applications (such as medical). To address these challenges, we propose a novel framework that attempts to uncover hidden subclasses via a repulsive point process. Our approach then leverages these fine-grained labels to make the classifier more inclusive across the subclasses. Our approach identifies subclasses without requiring additional supervision, thereby promoting diversity and reducing sensitivity to subclass imbalance. Extensive experiments on four benchmark datasets demonstrate consistent and significant improvements over state-of-the-art baselines across both balanced and imbalanced subclass distributions, underscoring the effectiveness and generalizability of our approach.
A Robust Clustered Federated Learning Approach for Non-IID Data with Quantity Skew
Federated Learning (FL) is a decentralized paradigm that enables a client-server architecture to collaboratively train a global Artificial Intelligence model without sharing raw data, thereby preserving privacy. A key challenge in FL is Non-IID data. Quantity Skew (QS) is a particular problem of Non-IID, where clients hold highly heterogeneous data volumes. Clustered Federated Learning (CFL) is an emergent variant of FL that presents a promising solution to Non-IID problem. It improves models' performance by grouping clients with similar data distributions into clusters. CFL methods generally fall into two operating strategies. In the first strategy, clients select the cluster that minimizes the local training loss. In the second strategy, the server groups clients based on local model similarities. However, most CFL methods lack systematic evaluation under QS but present significant challenges because of it. In this paper, we present two main contributions. The first one is an evaluation of state-of-the-art CFL algorithms under various Non-IID settings, applying multiple QS scenarios to assess their robustness. Our second contribution is a novel iterative CFL algorithm, named CORNFLQS, which proposes an optimal coordination between both operating strategies of CFL. Our approach is robust against the different variations of QS settings. We conducted intensive experiments on six image classification datasets, resulting in 270 Non-IID configurations. The results show that CORNFLQS achieves the highest average ranking in both accuracy and clustering quality, as well as strong robustness to QS perturbations. Overall, our approach outperforms actual CFL algorithms.
Large Model Annotation-Enhanced Spatio-Temporal Fusion Knowledge Tracing Model
Knowledge Tracing (KT) aims to model students' evolving knowledge states based on their interactions, supporting downstream applications such as personalized resource recommendation. Recent works have leveraged large language models (LLMs) to annotate spatial topological structures -- e.g., prerequisite and hierarchical relations-between knowledge concepts, significantly reducing manual annotation costs. However, two major challenges remain: (1) low interpretability and potential unreliability due to LLM-induced annotation errors or hallucinations, and (2) limited cross-hierarchical interactions that hinder model performance. To address these issues, we propose a Large Model Annotation-Enhanced Spatio-Temporal Fusion KT model. First, we introduce a Detection-Reannotation Strategy to mitigate LLM annotation errors and hallucinations, resulting in a more accurate Knowledge Concept (KC) relation graph. Second, we present a Unit Relation Graph Annotation Method to reduce the distance between cross-hierarchical nodes, thereby enhancing their interactions. Lastly, we propose a Spatio-Temporal Fusion Framework, incorporating a dual-view contrastive learning module and a graph-structured knowledge state propagation module, to more effectively model students' knowledge propagation. Experiments on three real-world educational datasets demonstrate that our method effectively improves the practicality and reliability of LLM annotations, achieving state-of-the-art performance. Moreover, the knowledge propagation process based on the annotated graph enhances interpretability for educational applications.
ProtoEHR: Hierarchical Prototype Learning for EHR-based Healthcare Predictions
Digital healthcare systems have enabled the collection of mass healthcare data in electronic healthcare records (EHRs), allowing artificial intelligence solutions for various healthcare prediction tasks. However, existing studies often focus on isolated components of EHR data, limiting their predictive performance and interpretability. To address this gap, we propose ProtoEHR, an interpretable hierarchical prototype learning framework that fully exploits the rich, multi-level structure of EHR data to enhance healthcare predictions. More specifically, ProtoEHR models relationships within and across three hierarchical levels of EHRs: medical codes, hospital visits, and patients. We first leverage large language models to extract semantic relationships among medical codes and construct a medical knowledge graph as the knowledge source. Building on this, we design a hierarchical representation learning framework that captures contextualized representations across three levels, while incorporating prototype information within each level to capture intrinsic similarities and improve generalization. To perform a comprehensive assessment, we evaluate ProtoEHR in two public datasets on five clinically significant tasks, including prediction of mortality, prediction of readmission, prediction of length of stay, drug recommendation, and prediction of phenotype. The results demonstrate the ability of ProtoEHR to make accurate, robust, and interpretable predictions compared to baselines in the literature. Furthermore, ProtoEHR offers interpretable insights on code, visit, and patient levels to aid in healthcare prediction.
Force Matching with Relativistic Constraints: A Physics-Inspired Approach to Stable and Efficient Generative Modeling
This paper introduces Force Matching (ForM), a novel framework for generative modeling that represents an initial exploration into leveraging special relativistic mechanics to enhance the stability of the sampling process. By incorporating the Lorentz factor, ForM imposes a velocity constraint, ensuring that sample velocities remain bounded within a constant limit. This constraint serves as a fundamental mechanism for stabilizing the generative dynamics, leading to a more robust and controlled sampling process. We provide a rigorous theoretical analysis demonstrating that the velocity constraint is preserved throughout the sampling procedure within the ForM framework. To validate the effectiveness of our approach, we conduct extensive empirical evaluations. On the half-moons dataset, ForM significantly outperforms baseline methods, achieving the lowest Euclidean distance loss of 0.714, in contrast to vanilla first-order flow matching (5.853) and first- and second-order flow matching (5.793). Additionally, we perform an ablation study to further investigate the impact of our velocity constraint, reaffirming the superiority of ForM in stabilizing the generative process. The theoretical guarantees and empirical results underscore the potential of integrating special relativity principles into generative modeling. Our findings suggest that ForM provides a promising pathway toward achieving stable, efficient, and flexible generative processes. This work lays the foundation for future advancements in high-dimensional generative modeling, opening new avenues for the application of physical principles in machine learning.
Dynamic Graph Learning via Historical Information Perception and Multi-Granular Temporal Curriculum Learning
Dynamic graph representation learning has emerged as a pivotal paradigm for modeling time-varying relational patterns in complex systems ranging from social networks to urban mobility. While existing methods achieve notable progress in temporal modeling, one critical challenge remains insufficiently addressed: identifying dual temporal evolution, i,e., instantaneous states and evolutionary trajectories. To address the challenge, we propose HMGNN, a novel dynamic graph learning framework that harmoniously integrates temporal dynamics modeling with stable structural representation learning, allowing adaptive pattern discovery while preserving feature consistency in evolving environments. Firstly, we propose a dynamic model that integrates a historical information perception module and a temporal aggregation module. The module converts the historical information into the model and adaptively measures the impact of the instantaneous and historical information effectively through the aggregation function. Secondly, we devise a dual-component model learning framework comprising contrastive learning and multi-granular temporal curriculum learning to holistically capture evolutionary dynamics. The contrastive learning component employs continuous-view contrastive alignment to preserve stable node feature across temporal evolution. Complementarily, our multi-granular temporal curriculum learning introduces masking mechanism to explicitly learn different time interval evolution patterns. Extensive experiments demonstrate the significant superiority of HMGNN against state-of-the-art dynamic graph learning methods in terms of all evaluation metrics.
Enhancing Contrastive Link Prediction With Edge Balancing Augmentation
Link prediction is one of the most fundamental tasks in graph mining, which motivates the recent studies of leveraging contrastive learning to enhance the performance. However, we observe two major weaknesses of these studies: i) the lack of theoretical analysis for contrastive learning on link prediction, and ii) inadequate consideration of node degrees in contrastive learning. To address the above weaknesses, we provide the first formal theoretical analysis for contrastive learning on link prediction, where our analysis results can generalize to the autoencoder-based link prediction models with contrastive learning. Motivated by our analysis results, we propose a new graph augmentation approach, Edge Balancing Augmentation (EBA), which adjusts the node degrees in the graph as the augmentation. We then propose a new approach, named Contrastive Link Prediction with Edge Balancing Augmentation (CoEBA), that integrates the proposed EBA and the proposed new contrastive losses to improve the model performance. We conduct experiments on 8 benchmark datasets. The results demonstrate that our proposed CoEBA significantly outperforms the other state-of-the-art link prediction models.
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation
Multivariate time series data, collected across various fields such as manufacturing and wearable technology, exhibit states at multiple levels of granularity, from coarse-grained system behaviors to fine-grained, detailed events. Effectively segmenting and integrating states across these different granularities is crucial for tasks like predictive maintenance and performance optimization. However, existing time series segmentation methods face two key challenges: (1) the inability to handle multiple levels of granularity within a unified model, and (2) limited adaptability to new, evolving patterns in dynamic environments. To address these challenges, we propose PromptTSS, a novel framework for time series segmentation with multi-granularity states. PromptTSS uses a unified model with a prompting mechanism that leverages label and boundary information to guide segmentation, capturing both coarse- and fine-grained patterns while adapting dynamically to unseen patterns. Experiments show PromptTSS improves accuracy by 24.49% in multi-granularity segmentation, 17.88% in single-granularity segmentation, and up to 599.24% in transfer learning, demonstrating its adaptability to hierarchical states and evolving time series dynamics. Our code is available at https://github.com/blacksnail789521/PromptTSS.
Multivariate Wind Power Time Series Forecasting with Noise-Filtering Neural ODEs
In wind energy generation, wind power prediction traditionally relies on the simulation of wind turbine operational data. Recently, many complex deep learning networks have challenged the traditional paradigm.Multivariate Wind Power Time Series (MWTS) with strong volatility are often non-uniformly sampled, making it difficult for traditional time series methods to model. To address these challenges, our approach for MWTS forecasting includes two modules: (1) The ODE-filter module uses a noise extraction network to learn the Ordinary Differential Equations (ODE) continuous-time dynamics, which removes high-frequency noise by treating the noise as a neural flow. The neural flow pushes the ODE forward in time steps. (2) An adaptive decomposition module applies an unfixed window driven by gradients to capture trend and seasonal components including more abrupt change details. Our method models the precise wind power evolution, can naturally forecast the multivariate wind power time series and reduce the interference from noise and outliers in the data. Experimental results show that our approach outperforms existing models on regularly and irregularly sampled MWTS.
Stamp: Semantic-Aware Sub-trajectory Anomaly Detection with Diffusion Multi-model Pool for Evolving Data Streams
Trajectory anomaly detection, as a fundamental operation for moving object pattern discovery, plays an irreplaceable and critical role in spatio-temporal location-based services. Conducting online detection based on the current positions and their contextual semantics can significantly enhance the value of trajectory data. However, existing approaches suffer from two fundamental limitations: 1) treat trajectories as indivisible sequences or apply rigid segmentation strategies, and 2) use of a single detection model that struggles to adapt to concept drift caused by evolving trajectory distributions. Such limitations make it impossible to detect abnormal trajectories in a timely and semantically comprehensive manner. To fill this gap, we propose Stamp, a novel framework for Semantic-aware sub-Trajectory Anomaly detection with a diffusion Multi-model Pool. In particular, Stamp comprises three key innovations: 1. It employs a semantic-driven dynamic segmentation mechanism that identifies natural breakpoints in trajectories based on changes in road semantics, rather than fixed rules. 2. It enhances trajectory representation by embedding road network semantic vectors, capturing both spatial geometry and functional urban characteristics. 3. It employs a pool of diffusion models that dynamically evolves through reliability assessment, similarity measurement, and strategic merging operations, ensuring adaptability to concept drift while leveraging the superior generative capabilities of diffusion models over traditional autoencoders. Experimental results demonstrate that Stamp improves detection efficiency by 35%, AUPR by 5.6%, and F1-score by 2.7% on two large-scale real-world urban trajectory datasets when compared to state-of-the-art methods, demonstrating its effectiveness for real-time anomaly detection in complex urban environments.
Spreader Behavior Forecasting: Intent-aware Neural Processes for Intervening Misinformation
The behavior of spreaders on social media evolves continuously, driven by shifting intentions and interactions with emerging news topics. Traditional approaches have focused on identifying misinformation spreaders, but have often relied on a static ground-truth label, limiting their applicability for implementing time-sensitive platform interventions. In contrast, our work tackles spreader behavior forecasting through an account-level credit score, modeling the temporal evolution of spreader behavior to capture the intent shifts that drive misinformation spreading. To this end, we propose a novel Intent-aware Neural Processes (INP) model, which focuses on tracking the evolving intent of spreaders over time. The model leverages a state transition structure and an intent state thinning algorithm to improve latent representations, enabling more accurate predictions of future spreader behavior. Experimental results on restructured datasets demonstrate the effectiveness of INP in identifying temporal risk regions for proactive misinformation intervention.
Structure-prior Informed Diffusion Model for Graph Source Localization with Limited Data
Source localization in graph information propagation is essential for mitigating network disruptions, including misinformation spread, cyber threats, and infrastructure failures. Existing deep generative approaches face significant challenges in real-world applications due to limited propagation data availability. We present SIDSL (Structure-prior Informed Diffusion model for Source Localization), a generative diffusion framework that leverages topology-aware priors to enable robust source localization with limited data. SIDSL addresses three key challenges: unknown propagation patterns through structure-based source estimations via graph label propagation, complex topology-propagation relationships via a propagation-enhanced conditional denoiser with GNN-parameterized label propagation module, and class imbalance through structure-prior biased diffusion initialization. By learning pattern-invariant features from synthetic data generated by established propagation models, SIDSL enables effective knowledge transfer to real-world scenarios. Experimental evaluation on four real-world datasets demonstrates superior performance with 7.5-13.3% F1 score improvements over baselines, including over 19% improvement in few-shot and 40% in zero-shot settings, validating the framework's effectiveness for practical source localization. Our code can be found here (https://github.com/tsinghua-fib-lab/SIDSL).
HyperGenFL: Hypernetwork-Generated Model Aggregation in Federated Learning
Federated learning is a decentralized framework that enables client participation in collaborative learning without centralized data collection. However, the framework is susceptible to suboptimal model convergence induced by heterogeneity among the client datasets. These discrepancies, including label imbalance, dissimilarity in data distributions, and uneven data volumes between clients, may cause disagreements among local client updates, affecting the ability of the global model to converge effectively during aggregation. We suggest that one potential solution to this problem lies in weighting the model aggregation by client importance and client-to-client relationships. Based on this idea, we propose HyperGenFL (HG-FL), a hypernetwork that generates aggregation weights from learnable client embeddings without requiring any training or benchmarking data. HG-FL utilizes the attention mechanism to capture inter-client relationships based on learnable client-specific embeddings in order to generate model aggregation weights dynamically during federated learning. By guiding the aggregation process with these learnable relationships between local models, HG-FL reduces update conflicts and improves global model performance. We assess HG-FL under various data-heterogeneous environments based on different benchmark datasets including Fashion-MNIST, CIFAR10, CIFAR100 and Tiny-ImageNet. Experimental results demonstrate that HG-FL can achieve superior performance over a range of existing baseline methods under challenging cases with various heterogeneous environments, large models and a large number of clients.
ActiViz: Understanding Sample Selection in Active Learning through Boundary Visualization
The performance of Active Learning (AL) methods varies widely, influenced by the query strategy, model, and dataset, with the reasons for variation in performance still unclear and insufficiently studied. However, commonly used metrics like accuracy, precision, and recall provide only limited analytical perspectives. No research has effectively uncovered or explained the reasons behind these performance variations, leaving a gap in understanding of the factors that influence the success or failure of AL methods. To address this issue, we propose a novel method and tool leveraging Voronoi Diagrams to visualize AL processes by illustrating interactions between classification decision boundary changes and queried samples across AL iterations. We perform experiments on synthetic and real-world datasets to validate the effectiveness of our method and analyze various AL query strategies. By visualizing the AL process, we illustrate how different query strategies progressively select samples and influence performance in each iteration. This reveals the potential benefits of adapting query strategies at different learning stages to improve active learning efficiency.
CCAgent: Coordinating Collaborative Data Scaling for Operating System Agents via Web3
The current AI revolution, fueled by Large Language Models (LLMs), heavily relies on vast open-access internet data. However, the Operating System (OS) Agent field faces a significant data sparsity challenge due to the lack of public data collection systems and privacy concerns. To address this, we introduce CCAgent Net, a system that coordinates and incentivizes internet users to contribute to scaling OS agent datasets. Furthermore, we propose GUI-Pipe, an automated data post-processing pipeline that evaluates, filters, and transforms raw user-uploaded OS data into trainable image-instruction-answer data. This process results in CCAgent-Instruct, the largest instruction-based multi-platform GUI agent dataset. Our experiments demonstrate CCAgent's effectiveness in advancing OS Agent development. For instance, our CCAgent-GUI-3B model achieves a score of 33.7 (+127%) on the challenging out-of-domain Screenspots-Pro benchmark, significantly outperforming other advanced open-source models like UI-TARS-2B, Qwen2.5-VL-3B/7B, and UGround-V1-7B, even those of larger sizes. Our experiments also reveal the scaling behaviors of GUI Agent training and insights for future direction.
M-LLM3REC: A Motivation-Aware User-Item Interaction Framework for Enhancing Recommendation Accuracy with LLMs
Recommendation systems have been essential for both user experience and platform efficiency by alleviating information overload and supporting decision-making. Traditional methods, i.e., content-based filtering, collaborative filtering, and deep learning, have achieved impressive results in recommendation systems. However, the cold-start and sparse-data scenarios are still challenging to deal with. Existing solutions either generate pseudo-interaction sequence, which often introduces redundant or noisy signals, or rely heavily on semantic similarity, overlooking dynamic shifts in user motivation. To address these limitations, this paper proposes a novel recommendation framework, termed M-LLM3REC, which leverages large language models for deep motivational signal extraction from limited user interactions. M-LLM3REC comprises three integrated modules: the Motivation-Oriented Profile Extractor (MOPE), Motivation-Oriented Trait Encoder (MOTE), and Motivational Alignment Recommender (MAR). By emphasizing motivation-driven semantic modeling, M-LLM3REC demonstrates robust, personalized, and generalizable recommendations, particularly boosting performance in cold-start situations in comparison with the state-of-the-art frameworks.
Efficient Mask Learning for Language Model Fine-Tuning
Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has shown promising results by updating significantly fewer parameters than full fine-tuning. Masking-based fine-tuning is one type of PEFT methods, by freezing the majority of the model parameters during fine-tuning. Existing masking-based fine-tuning methods either need to manually select the trainable parameters (heuristic-based), or perform mask learning to adaptively select the trainable parameters with high memory and computation cost. To address these problems, this paper proposes Low-Rank based Efficient Mask Learning (LoReML). LoReML performs mask learning based on low-rank decomposition and matrix reconstruction with a small ratio of new parameters. After mask learning, LoReML uses the scaled intermediate results in mask learning as warm start initialization to boost the model quality, then freezes the masked parameters accordingly, and fine-tunes the PLM. Moreover, LoReML exploits sparse training techniques to enhance the memory efficiency in masking-based fine-tuning. Experimental results across various tasks and pre-trained backbones demonstrate that LoReML can notably outperform existing heuristic-based methods. Moreover, LoReML achieves competitive or better performance compared with the adaptive mask learning methods, while improving memory and computation efficiency by over 50% in mask learning.
Adaptive Heterogeneous Graph Neural Networks: Bridging Heterophily and Heterogeneity
Heterogeneous graphs (HGs) are common in real-world scenarios and often exhibit heterophily. However, most existing studies focus on either heterogeneity or heterophily in isolation, overlooking the prevalence of heterophilic HGs in practical applications. Such ignorance leads to their performance degradation. In this work, we first identify two main challenges in modeling heterophily HGs: (1) varying heterophily distributions across hops and meta-paths; (2) the intricate and often heterophily-driven diversity of semantic information across different meta-paths. Then, we propose the Adaptive Heterogeneous Graph Neural Network (AHGNN) to tackle these challenges. AHGNN employs a heterophily-aware convolution that accounts for heterophily distributions specific to both hops and meta-paths. It then integrates messages from diverse semantic spaces using a coarse-to-fine attention mechanism, which filters out noise and emphasizes informative signals. Experiments on seven real-world graphs and twenty baselines demonstrate the superior performance of AHGNN, particularly in high-heterophily situations.
Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation
Reinforcement learning-based recommender systems (RL4RS) have gained attention for their ability to adapt to dynamic user preferences. However, these systems face challenges, particularly in offline settings, where data inefficiency and reliance on pre-collected trajectories limit their broader applicability. While offline reinforcement learning methods leverage extensive datasets to address these issues, they often struggle with noisy data and fail to capture long-term user preferences, resulting in suboptimal recommendation policies. To overcome these limitations, we propose Diffusion-enhanced Actor-Critic for Offline RL4RS (DAC4Rec), a novel framework that integrates diffusion processes with reinforcement learning to model complex user preferences more effectively. DAC4Rec leverages the denoising capabilities of diffusion models to enhance the robustness of offline RL algorithms and incorporates a Q-value-guided policy optimization strategy to better handle suboptimal trajectories. Additionally, we introduce an energy-based sampling strategy to reduce randomness during recommendation generation, ensuring more targeted and reliable outcomes. We validate the effectiveness of DAC4Rec through extensive experiments on six real-world offline datasets and in an online simulation environment, demonstrating its ability to optimize long-term user preferences. Furthermore, we show that the proposed diffusion policy can be seamlessly integrated into other commonly used RL algorithms in RL4RS, highlighting its versatility and wide applicability.
Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior
Reinforcement Learning-based recommender systems (RLRS) offer an effective way to handle sequential recommendation tasks but often face difficulties in real-world settings, where user feedback data can be sub-optimal or sparse. In this paper, we introduce MDT4Rec, an offline RLRS framework that builds on the Decision Transformer (DT) to address two major challenges: learning from sub-optimal histories and representing complex user-item interactions. First, MDT4Rec shifts the trajectory stitching procedure from the training phase to action inference, allowing the system to shorten its historical context when necessary and thereby ignore negative or unsuccessful past experiences. Second, MDT4Rec initializes DT with a pre-trained large language model (LLM) for knowledge transfer, replaces linear embedding layers with Multi-Layer Perceptrons (MLPs) for more flexible representations, and employs Low-Rank Adaptation (LoRA) to efficiently fine-tune only a small subset of parameters. We evaluate MDT4Rec on five public datasets and in an online simulation environment, demonstrating that it outperforms existing methods.
Target Item-oriented Conditional Diffusion Differential Transformer for Next-Item Prediction
Sequential recommendation (SR) aims to capture users' dynamic preferences based on their historical interactions and provide personalized next-item prediction. Multi-behavior SR (MBSR) further considers behavior types of user-item interactions, which can reveal diverse user interests and alleviate the data sparsity issue w.r.t. the target purchase behaviors. Most existing MBSR approaches ignore the importance of target items closely related to user interests. Moreover, they often suffer from the problem of limited vector representation capability. To tackle the above two challenges, we propose a novel solution called target item-oriented conditional diffusion differential Transformer (ICDDT). Specifically, our ICDDT introduces distribution representations via the diffusion model, allowing effective utilization of target item information during training to better capture user preferences. Firstly, our ICDDT achieves a more appropriate behavior-aware step selection in the diffusion phase by distinguishing the sampling distributions of diffusion steps w.r.t. behavior types. Secondly, our ICDDT introduces three conditions of interaction sequences, target behaviors and diffusion steps into the reverse phase to guide the training of the differential Transformer-based approximator, generating denoised target item representations as user personalized interests. Finally, our ICDDT sets an inference step truncation factor to fit the diffusion step sampling distributions and accelerate the inference process. We conduct extensive experiments on two real-world datasets, where the results show that our ICDDT significantly outperforms all baselines on all metrics. The datasets, source codes and scripts are available at https://github.com/Erin-Gr/ICDDT.
ConsensNet: A Unified Consensus-Centric Framework for Incomplete Multi-View Clustering
Incomplete Multi-View Clustering (IMVC) addresses the challenge of missing data by leveraging available information and effectively mining cross-view relationships. While contrastive learning has recently been introduced into IMVC for discriminative representation learning, existing methods typically adopt pairwise contrastive strategies with view-specific reconstruction and heuristic fusion schemes. These approaches are generally suboptimal when facing high missing-view ratios and struggle to capture latent cross-view dependencies. To overcome these limitations, we propose ConsensNet, a unified consensus-centric framework for IMVC. This is achieved through a unified architecture that integrates contrastive cross-view alignment, consensus prediction, and attention-aware fusion. By aligning all available views into a shared semantic space, ConsensNet effectively captures latent cross-view dependencies without requiring high-quality view completion. Moreover, the attention-aware fusion mechanism dynamically assigns weights to each view based on its relevance to the consensus, thereby reducing the impact of noisy or weakly correlated views. Extensive experiments on multiple datasets demonstrate that ConsensNet consistently outperforms state-of-the-art IMVC methods, particularly under high missing-view scenarios, highlighting its robustness and practical significance.
NR-GCF: Graph Collaborative Filtering with Improved Noise Resistance
Graph Neural Networks (GNNs) have emerged as the preferred backbone model of recommender systems, credited to their strong capability in capturing the intricate topological relationships within user-item interactions. Nevertheless, a common oversight in existing studies is the presumption of the inherent reliability of these interactions, ignoring the reality that a significant fraction of user-item engagements, such as accidental clicks, are inherently noisy. Extensive studies have revealed that GNN is vulnerable to such noisy edges within the graph-structured data, as those noisy edges can mislead the network into overfitting incorrect patterns of interactions, thereby propagating such incorrect information through the entire interaction network.To address these challenges, in this paper, we propose a novel noise-robust GNN-based training strategy for recommendation, known as Noise-Resistant Graph Collaborative Filtering (NR-GCF). NR-GCF innovatively adopts a two-stage learning paradigm to filter out unreliable interactions, leveraging the memorization effect of GNNs. It further utilizes representation modulation to learn noise-resistant embeddings, enhancing robustness for recommendation tasks.Comprehensive experiments and ablation studies demonstrate the effectiveness and robustness of the proposed NR-GCF. Our implementation has been made available at:https://github.com/1197151063/NRGCF.git
High-Context Empathy in Conversations for Large Language Models
Large Language Models (LLMs) exhibit remarkable capabilities across various downstream tasks, including empathetic dialogues. However, a non-trivial question arises: Do they possess high-context empathy and can they generate emotional interactions with humans? High-context empathy, which tends to be more indirect and concise like Chinese-style empathy, differs from the current empathy capabilities of LLMs. These capabilities are predominantly low-context empathy, which is often direct and lengthy, resembling English-style empathy. In this paper, We first construct a comprehensive Chinese High-context Empathy Dialogue dataset (HED), which consists of emotional, role-based emotional, personality-based emotional, and role-personality-based emotional dialogues. Next, we explore whether LLMs have high-context empathy in conversations. After that, we propose an innovative High-context Empathy Network (HEN) to improve LLMs' capabilities in generating high-context empathetic responses. Our empirical study demonstrates that there is much room for LLMs in generating high-context empathetic responses, and the proposed HEN can not only significantly improve LLMs' capabilities in generating high-context empathetic responses, but also has positive effects for LLMs in solving similar sentiment-related tasks.
Rethinking Client-oriented Federated Graph Learning
As a new distributed graph learning paradigm, Federated Graph Learning (FGL) facilitates collaborative model training across local systems while preserving data privacy. We review existing FGL approaches and categorize their optimization mechanisms into: (1) Server-Client (S-C), where clients upload local model parameters for server-side aggregation and global updates; (2) Client-Client (C-C), which allows direct exchange of information between clients and customizing their local training process. We reveal that C-C shows superior potential due to its refined communication structure. However, existing C-C methods broadcast redundant node representations, incurring high communication costs and privacy risks at the node level. To this end, we propose FedC4, which combines graph Condensation with C-C Collaboration optimization. Specifically, FedC4 employs graph condensation technique to refine the knowledge of each client's graph into a few synthetic embeddings instead of transmitting node-level knowledge. Moreover, FedC4 introduces three novel modules that allow the source client to send distinct node representations tailored to the target client's graph properties. Experiments on eight public real-world datasets show that FedC4 outperforms state-of-the-art baselines in both task performance and communication cost. Our code is now available on https://github.com/Ereshkigal1/FedC4.
Hypercomplex Prompt-aware Multimodal Recommendation
Modern recommender systems face critical challenges in handling information overload while addressing the inherent limitations of multimodal representation learning. Existing methods suffer from three fundamental limitations: (1) restricted ability to represent rich multimodal features through a single representation, (2) existing linear modality fusion strategies ignore the deep nonlinear correlations between modalities, and (3) static optimization methods failing to dynamically mitigate the over-smoothing problem in graph convolutional network (GCN). To overcome these limitations, we propose HPMRec, a novel Hypercomplex Prompt-aware Multimodal Recommendation framework, which utilizes hypercomplex embeddings in the form of multi-components to enhance the representation diversity of multimodal features. HPMRec adopts the hypercomplex multiplication to naturally establish nonlinear cross-modality interactions to bridge semantic gaps, which is beneficial to explore the cross-modality features. HPMRec also introduces the prompt-aware compensation mechanism to aid the misalignment between components and modality-specific features loss, and this mechanism fundamentally alleviates the over-smoothing problem. It further designs self-supervised learning tasks that enhance representation diversity and align different modalities. Extensive experiments on four public datasets show that HPMRec achieves state-of-the-art recommendation performance.
FROG: Fair Removal on Graph
With growing emphasis on privacy regulations, machine unlearning has become increasingly critical in real-world applications such as social networks and recommender systems, many of which are naturally represented as graphs. However, existing graph unlearning methods often modify nodes or edges indiscriminately, overlooking their impact on fairness. For instance, forgetting links between users of different genders may inadvertently exacerbate group disparities. To address this issue, we propose a novel framework that jointly optimizes both the graph structure and the model to achieve fair unlearning. Our method rewires the graph by removing redundant edges that hinder forgetting while preserving fairness through targeted edge augmentation. We further introduce a worst-case evaluation mechanism to assess robustness under challenging scenarios. Experiments on real-world datasets show that our approach achieves more effective and fair unlearning than existing baselines.
Advancing Temporal Sensitive Question Answering through Progressive Multi-Step Reflection
Retrieval-augmented generation (RAG) has demonstrated strong potential in enhancing large language models (LLMs) for complex, real-world question answering. However, existing RAG frameworks remain inadequate for temporal scenarios, primarily due to their inability to jointly model temporal constraints in both retrieval and reasoning. On the retrieval side, traditional approaches focus on semantic similarity, often returning outdated or temporally misaligned evidence. On the generation side, these systems frequently produce factually incorrect or hallucinated answers when confronted with incomplete or temporally inconsistent information. Motivated by the observed limitations, we propose ChronoReflect+, a temporal logic-aware RAG framework that incorporates hybrid temporal-aware retrieval and progressive multi-step reflection. Our method iteratively refines both retrieval and reasoning, identifying and bridging information gaps as context accumulates. Extensive experiments demonstrate that ChronoReflect+ significantly outperforms state-of-the-art RAG baselines-improving end-to-end accuracy by 15.2%-particularly on questions involving implicit time expressions and multi-hop reasoning.
Evolving Graph-Based Context Modeling for Multi-Turn Conversational Retrieval-Augmented Generation
Conversational Retrieval-Augmented Generation (RAG) systems enhance user interactions by integrating large language models (LLMs) with external knowledge retrieval. However, multi-turn conversations present significant challenges, including implicit user intent and noisy context, which hinder accurate retrieval and response generation. Existing approaches often struggle with the unstructured conversational context and fail to model explicit relations among conversational turns. Moreover, they do not leverage historically relevant passages effectively. To overcome these limitations, we propose EvoRAG, a novel framework that maintains an evolving knowledge graph aligned with the unstructured conversational context. This graph explicitly captures relations among user queries, system responses, and relevant passages across conversational turns, serving as a structured representation of the context. EvoRAG includes three key components: (1) a dual-path retrieval module for context denoising, (2) a unified knowledge integration module for query rewriting and summarization, and (3) a graph-enhanced RAG module for accurate retrieval and response generation. Experiments on four public conversational RAG datasets show that EvoRAG significantly outperforms strong baselines, particularly in handling topic shifts and long dialogue contexts.
PP-STAT: An Efficient Privacy-Preserving Statistical Analysis Framework using Homomorphic Encryption
With the widespread adoption of cloud computing, the need for outsourcing statistical analysis to third-party platforms is growing rapidly. However, handling sensitive data such as medical records and financial information in cloud environments raises serious privacy concerns. In this paper, we present PP-STAT, a novel and efficient Homomorphic Encryption (HE)-based framework for privacy-preserving statistical analysis. HE enables computations to be performed directly on encrypted data without revealing the underlying plaintext. PP-STAT supports advanced statistical measures, including Z-score normalization, skewness, kurtosis, coefficient of variation, and Pearson correlation coefficient, all computed securely over encrypted data. To improve efficiency, PP-STAT introduces two key optimizations: (1) a Chebyshev-based approximation strategy for initializing inverse square root operations, and (2) a pre-normalization scaling technique that reduces multiplicative depth by folding constant scaling factors into mean and variance computations. These techniques significantly lower computational overhead and minimize the number of expensive bootstrapping procedures. Our evaluation on real-world datasets demonstrates that PP-STAT achieves high numerical accuracy, with mean relative error (MRE) below 2.4×10-4. Notably, the encrypted Pearson correlation coefficient between the smoker attribute and charges reaches 0.7873, with an MRE of 2.86×10-4. These results confirm the practical utility of PP-STAT for secure and precise statistical analysis in privacy-sensitive domains.
DYCOR: Capturing Hidden Stock Relationships for Stock Trend Prediction
Stock trend prediction, the task of forecasting future trends of stocks from their historical feature sequences, remains highly challenging due to the complex and dynamic nature of financial markets. In reality, stocks form diverse relationships that transcend traditional sector boundaries as market conditions evolve, i.e., stocks within the same sector may display different trends, while those in different sectors often exhibit similar movements. However, most existing stock prediction methods rely on predefined static relationships, lacking flexibility to adapt to changing market dynamics. Furthermore, objectives widely adopted in prior work have limitations in capturing complex patterns and relationships in stock market data. To address these limitations, we propose DYCOR, a novel stock trend prediction method that integrates two key innovations: (i) dynamic stock clustering, which captures market characteristics without relying on predefined relationship data by adaptively discovering hidden stock relationships; and (ii) correlation-aware training, which aligns predicted and ground-truth stock trends by reflecting their correlations in a fine-grained manner. We evaluate DYCOR on three datasets NASDAQ, NYSE, and S&P 500 widely used in existing research, and this method demonstrates superior performance across correlation-based and retrieval-based metrics compared to state-of-the-art baseline methods, while maintaining competitive runtime efficiency.
Learnable Orthogonal Decomposition for Non-Regressive Prediction for PDE
Modeling the spatio-temporal evolution of complex physical sys- tems remains a fundamental challenge in both deep learning and sci- entific computing. While recent methods such as Transformers and Neural Operators have shown promise in learning PDE solutions, their reliance on auto-regressive forecasting often increases compu- tational overhead and accumulates prediction errors over time. In this paper, we propose Learnable Orthogonal Decomposition (LOD), a non-regressive framework that integrates ideas from classical Proper Orthogonal Decomposition (POD) with modern deep learn- ing. LOD first performs parameter-wise POD: at each time step, we apply POD to an ensemble of PDE solutions generated under differ- ent physical parameters, yielding a time-indexed set of orthonormal spatial bases. These bases initialize a learnable dictionary and are refined during end-to-end training. Given only a short prefix of ini- tial conditions for a target parameter setting, a neural encoder pre- dicts-in a single shot-the entire trajectory of parameter-wise POD coefficients. The final solution field is reconstructed by combining the predicted coefficients with the learned bases, avoiding error ac- cumulation inherent to auto-regressive strategies. Comprehensive experiments on various PDE benchmark datasets demonstrate that LOD achieves state-of-the-art accuracy while significantly reducing computational costs.
BrainX: A Universal Brain Decoding Framework with Feature Disentanglement and Neuro-Geometric Representation Learning
Decoding visual stimuli from human brain activity is a fundamental challenge in cognitive neuroscience and neuroimaging. While recent advances in deep learning have significantly improved the performance of fMRI-to-image decoding, most existing methods overlook the issue of inter-subject variability in fMRI data, which leads to poor generalization across subjects. Current approaches often rely on partially shared model architectures that offer limited generalization and still require subject-specific components, restricting their applicability to unseen subjects. To address this limitation, we propose BrainX, a universal brain decoding framework that constructs a unified fMRI encoder and image generator to achieve subject-agnostic modeling. Specifically, we introduce a feature disentanglement mechanism that extracts subject-shared features from the fMRI embeddings, which are then fed into the image generator to reconstruct visual stimuli. This design eliminates the need for subject-specific models and significantly enhances cross-subject generalization. Additionally, we develop a neuro-geometric fMRI representation learning method that projects 3D cortical structures onto a 2D surface space, effectively mitigating the inaccuracies caused by imprecise geodesic distance estimation in 3D Euclidean space. Extensive experiments on the Natural Scenes Dataset (NSD) demonstrate that BrainX consistently outperforms existing state-of-the-art methods across three decoding settings: within-subject, cross-subject with finetuning, and cross-subject without finetuning.
FedGVD: Efficient Federated Graph Learning via Unidirectional Distillation with Dynamic Virtual Nodes
Federated Graph Learning (FGL) has emerged as a key paradigm for distributed graph machine learning, enabling cross-domain graph collaborative modeling while preserving data privacy. However, existing methods face two major bottlenecks: the structural heterogeneity discrepancy of graph data among clients weakens the generalization ability of the global model; and model heterogeneity leads to inefficient knowledge sharing and complex global aggregation. To address these issues, we propose FedGVD, an efficient framework that constructs a global perspective through data condensation and server-side virtual node generation, which not only preserves the semantic equivalence of the original data but also avoids privacy leakage. Subsequently, by distributing low-dimensional generalizable knowledge for unidirectional distillation, FedGVD enables local models to absorb global knowledge without transmitting local parameters, thus breaking through the challenges of data and structural heterogeneity as well as model heterogeneity. This innovative approach ensures privacy-preserving and efficient federated graph collaboration. Experiments show that FedGVD maintains excellent performance in heterogeneous model scenarios while significantly improving communication efficiency, offering a new approach for privacy-preserving collaborative modeling in FGL. The code is available at https://github.com/Jasonxx4/FedGVD.
TFMAdapter: Lightweight Instance-Level Adaptation of Foundation Models for Forecasting with Covariates
Time Series Foundation Models (TSFMs) have recently achieved state-of-the-art performance in univariate forecasting on new time series simply by conditioning on a brief history of past values. Their success demonstrates that large-scale pretraining across diverse domains can acquire the inductive bias to generalize from temporal patterns in a brief history. However, most TSFMs are unable to leverage covariates-future-available exogenous variables critical for accurate forecasting in many applications-due to their domain-specific nature and the lack of associated inductive bias. We propose TFMAdapter, a lightweight, instance-level adapter that augments TSFMs with covariate information without fine-tuning. Instead of retraining, TFMAdapter operates on the limited history provided during a single model call, learning a non-parametric cascade that combines covariates with univariate TSFM forecasts. However, such learning would require univariate forecasts at all steps in the history, requiring too many calls to the TSFM. To enable training on the full historical context while limiting TSFM invocations, TFMAdapter uses a two-stage method: (1) generating pseudo-forecasts with a simple regression model, and (2) training a Gaussian Process regressor to refine predictions using both pseudo- and TSFM forecasts alongside covariates. Extensive experiments on real-world datasets demonstrate that TFMAdapter consistently outperforms both foundation models and supervised baselines, achieving a 24-27% improvement over base foundation models with minimal data and computational overhead. Our results highlight the potential of lightweight adapters to bridge the gap between generic foundation models and domain-specific forecasting needs.
ExplorAct: Context-Aware Next Action Recommendations for Interactive Data Exploration
Modern data analysis platforms, such as Tableau, Microsoft Power BI, Google Looker Studio, Kibana, and Splunk, have democratized data exploration by enabling users to interact with data through intuitive visual interfaces, eliminating the need for proficiency in query languages like SQL. These platforms allow both experts and non-experts to perform high-level operations and incrementally construct complex analysis workflows. As the volume and complexity of data grow, assisting users in navigating these workflows becomes increasingly important. One promising direction is to provide intelligent next-action recommendations that guide users through meaningful and efficient exploration paths. In this paper, we present ExplorAct, a context-aware next-action recommendation framework that leverages historical session logs to predict and suggest relevant next steps during data exploration. Unlike existing approaches that suffer from scalability issues due to log-size-dependent retrieval, ExplorAct achieves constant-time inference by employing a deep learning architecture that models both the structural and sequential aspects of exploration sessions. Through extensive experiments on four real-world datasets, we show that ExplorAct consistently outperforms state-of-the-art (SOTA) baselines across three core recommendation tasks, while maintaining stable and low-latency inference regardless of log size.
Correlation-aware Online Change Point Detection
Change point detection aims to identify abrupt shifts occurring at multiple points within a data sequence. This task becomes particularly challenging in the online setting, where different types of change can occur, including shifts in both the marginal and joint distributions of the data. In this paper, we address these challenges by tracking the Riemannian geometry of correlation matrices, allowing Riemannian metrics to compute the geodesic distance as an accurate measure of correlation dynamics. We introduce Rio-CPD, a correlation-aware online change point detection framework that integrates the Riemannian geometry of the manifold of symmetric positive definite matrices with the cumulative sum (CUSUM) statistic for detecting change points. Rio-CPD employs a novel CUSUM design by computing the geodesic distance between current observations and the Fréchet mean of prior observations. With appropriate choices of Riemannian metrics, Rio-CPD offers a simple yet effective and computationally efficient algorithm. We also provide a theoretical analysis on standard metrics for change point detection within Rio-CPD. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods on detection accuracy, average detection delay, and efficiency.
Mitigating Latent Confounding Bias in Recommender Systems
Recommender systems are crucial for providing personalised experiences, but their effectiveness is often undermined by confounding bias, particularly in the presence of latent confounders. Existing debiasing methods typically address only one type of latent confounding bias, often ignoring the complex interactions caused by latent confounders, such as those between items and user feedback, and between item exposure and user feedback. To tackle these challenges, we propose a novel Deep Instrumental Variables (IV) approach for debiased representation learning in Recommendation Systems, referred to as DIVERS. Specifically, DIVERS leverages user feature embeddings as IVs to mitigate the confounding bias between items and user feedback caused by latent confounders, and combines the debiased item embeddings with an item exposure vector to generate a reconstructed item exposure vector. Moreover, DIVERS employs an identifiable Variational Auto-Encoder (iVAE) to infer identifiable representations by utilising information from both the original and reconstructed item exposure vectors, effectively addressing the confounding bias introduced by latent confounders between item exposure and user feedback. Additionally, we provide theoretical analyses to demonstrate the soundness of using IV and the identifiability of the representation learned by DIVERS. Extensive experiments on both synthetic and real-world datasets confirm that DIVERS outperforms state-of-the-art models in reducing bias and providing reliable recommendations. Our source code is available at: https://github.com/djf-web/DIVERS.
DDE-CLIP: Detail-Guided Dual-Modal Enhancement for Zero-Shot Anomaly Detection
Zero-shot Anomaly Detection (ZSAD) is an emerging task in industrial settings. It aims to detect anomalies in a target dataset without training samples, which is crucial for sample scarcity and data privacy. Existing methods largely rely on CLIP, leveraging its internal knowledge to detect anomalies. However, due to its pre-training on natural image-text pairs, CLIP suffers from domain shift, favoring global semantics over fine-grained defect detection in industrial images. Furthermore, most existing methods employ fixed text prompt to guide the model, which is difficult to describe diverse and unseen anomalies, leading to poor accuracy. To address these limitations, we propose a Detail-guided Dual-modal Enhancement Model (DDE-CLIP) for the ZSAD task. Firstly, we designed the Detail Feature Reinforcement Module (DFRM) to capture local representations of minute defects. Its specialized design effectively enhances the model's perception of fine-grained anomalies and enables the pre-trained CLIP model to better adapt to the unique visual characteristics of industrial images. Subsequently, we introduced the Visual-guided Text Refinement Module (VTRM), which can dynamically optimize text prompts based on the input image's visual content (particularly the detail features captured by DFRM). This ensures the accurate reflection of text prompts on specific semantics of various defects, thereby significantly enhancing the alignment between vision and text for unseen anomalies. Overall, our DDE-CLIP uses detail features to enhance both image and text modalities, effectively addressing the challenges of ZSAD. Extensive experiments on 7 real-world industrial product datasets demonstrate that DDE-CLIP exhibits superior detection and localization capabilities compared to other methods. The code is available at https://github.com/zhushengxinyue/DDE-CLIP.
Urban In-context Learning: A New Paradigm for Urban Indicator Prediction
Recent years have witnessed the rapid development of urbanization. Specifically, urban indicator prediction has become an important tool for urban planning and decision-making and promoting the process of urbanization. However, the existing methods have the following two drawbacks. First, they follow the ''pre-training and fine-tuning'' paradigm, which is time-consuming and resource-intensive. Second, to encode urban knowledge for downstream tasks effectively, complex pre-training tasks must be designed to train the model in a task-agnostic manner while ensuring generalization. In this work, we propose UrbanICL, an urban in-context learning framework as a new paradigm for urban indicator prediction. Compared to directly predicting urban indicators, we obtain predictions for new regions by aggregating the downstream labels of similar regions. Specifically, a retrieval-based urban in-context learning module is proposed to retrieve regions with similar urban semantics and aggregate their corresponding labels to make predictions for new regions. We also design a region-dependent distribution learning module to learn the new distribution of unknown regions and facilitate the adaptation of UrbanICL for distributional shifts and outliers. Our framework, with in-context learning, brings a new insight for urban indicator prediction. We conduct extensive experiments on real-world datasets collected from three cities. The experiment results demonstrate the effectiveness of UrbanICL, even in an extremely low-consumption and time-efficient manner.
DIVAgent: A Diversified Search Agent that Mimics the Human Search Process
Search result diversification plays a crucial role in addressing query ambiguity and multi-faceted information needs by reducing redundancy across documents. While previous supervised approaches can achieve superior performance, they require costly, large-scale annotated data. In contrast, unsupervised methods are more flexible and training-free but rely on manually designed ranking functions, often leading to suboptimal performance. Inspired by how humans explore diverse information during real-world searching, we propose a diversified search agent DIVAgent to combine the advantages of supervised and unsupervised methods. DIVAgent introduces LLMs as the ''brain'' to reason over complex and diverse search results and delineate human cognitive processes into a workflow tailored for search result diversification. Our search agent first identifies potential user intents and then analyzes the alignment of each document to the intents via an intent-aware module. To guide the generation of diversified document rankings, we design an intent-guided ranker that explicitly links documents to their dominant intents while performing greedy document selection. Experimental results demonstrate that DIVAgent significantly outperforms existing unsupervised baselines and achieves competitive performance with supervised models, highlighting the promise of LLMs for diversified ranking in realistic search scenarios.
Unsupervised Adversarial Contrastive Hashing for Cross-Modal Retrieval
Cross-modal hashing has gained widespread attention in cross-modal retrieval due to its low storage cost and significant computational efficiency. Existing cross-modal hashing methods primarily focus on learning modality invariance by mapping data from different modalities into a shared space and learn unified hash codes. Nevertheless, due to the inherent heterogeneity between different modalities, the common subspace may still exhibit modality discrepancies. This ultimately makes it challenging to achieve semantic alignment, thereby affecting the accuracy of cross-modal retrieval. To address this issue, we propose an Unsupervised Adversarial Contrastive Hashing (UACH) method for cross-modal retrieval. Specifically, we design a cycle generative adversarial network to learn the transformation relationships between different modality feature domains, effectively promoting semantic alignment across modalities. Additionally, we employ dual contrastive learning to simultaneously measure the representation learning and hashing learning components of each specific modality, and learn unified hash codes for each specific modality, thus mitigating the impact of modality discrepancies. Extensive experiments conducted on three cross-modal benchmark datasets demonstrate that our model outperforms the state-of-the-art baselines.
EventPuzzle: A Benchmark for Multi-Perspective Event Prediction Based on Event Arguments
Event prediction is a critical task in natural language processing, aimed at reasoning and forecasting future events based on known event texts. This paper introduces EventPuzzle, a benchmark designed to evaluate the event prediction capabilities of large language models based on event arguments. By introducing argument points, we design tasks and evaluation methods to assess models' ability to predict events from different argument perspectives. EventPuzzle consists of both closed-ended and open-ended tasks. In the closed-ended task, models select the correct argument point from causal chains, while in the open-ended task, models generate event descriptions using two strategies: Argument-based Generation and Direct Generation. We construct an argument point dataset and evaluate multiple LLMs, demonstrating the models' performance across various tasks. Our experimental analysis reveals the strengths and limitations of current models and suggests future directions for improving event prediction.
Adaptive Bidirectional State Space Model for High-frequency Portfolio Management
State space models (SSMs) have recently shown great potential on long-range sequence modeling tasks. Benefiting from SSMs' low spatio-temporal overhead and powerful modeling capabilities, utilizing them for high-frequency portfolio management is an appealing research direction. However, representing financial data is challenging for SSMs due to: 1) the non-stationary nature of financial markets and 2) the requirement of asset correlations for financial understanding. In this paper, under a deep reinforcement learning (DRL) paradigm for high-frequency portfolio management, we propose a novel Adaptive Bidirectional State Space Model (ABSSM) to tackle the above challenges. Specifically, in order to cope with changing market conditions, we design an adaptive linear time-varying structure, which precisely captures domain shifts in temporal patterns through an input-dependent state transition matrix, thereby seizing fleeting arbitrage opportunities. Furthermore, we enhance this framework by constructing a bidirectional state space layer, which extracts asset correlations by compressing the global context. To the best of our knowledge, this is the first work that solves the high-frequency portfolio management problem by devising a specialized state space model in the DRL framework. Through extensive experiments on real-world data from the U.S., China, and cryptocurrency markets, we show that our proposed ABSSM significantly outperforms state-of-the-art benchmark methods in balancing profits and risks.
Exploring the Impact of Warnings on User Perception towards AI-Generated Content in Search Results
Generative-AI answer boxes have become a default element of modern search engines, yet their responses are not always trustworthy. We study whether a simple disclosure can temper the influence of these answers on users' beliefs and behaviour. In a between-subjects online experiment (N=57) participants formed opinions on one of three controversial topics while interacting with a SERP whose featured answer was produced by ChatGPT. Each participant had a 50% chance of seeing a banner that (i) disclosed the answer's AI origin, (ii) listed three key limitations, and (iii) linked to additional details. The banner did not shift overall opinion means, but it changed how people reacted to the AI: participants who saw the warning were 83% more likely to adopt a stance that contradicted the chatbot's answer than those in the no-banner condition. Stance alignment proved even more decisive: when the AI answer matched a user's initial view, the likelihood of attitude change fell by 85% and exploration of opposing results dropped significantly. Conversely, when the AI disagreed with users, 41% of their post-task explanations converged semantically toward the chatbot's wording (vs. 14% when they already agreed), revealing subtle linguistic uptake detected via BERT embeddings. Together these findings show that (1) a lightweight disclosure can promote a more critical stance toward AI output, yet (2) pre-existing agreement with the AI strongly anchors users, suppressing both critical search behaviour and reflective revision. We argue that AI-integrated search interfaces should pair transparency banners with additional bias-mitigation strategies to support informed and balanced information seeking.
Decoupling Feature Entanglement for Personalized Federated Learning via Neural Collapse
Heterogeneous data is a critical challenge in Personalized Federated Learning (pFL), as it leads to feature entanglement, making it difficult to share knowledge across clients. Addressing this issue requires a deep understanding of the underlying mechanisms of feature distribution, as well as effective strategies for sharing knowledge among participating clients. Motivated by the phenomenon of Neural Collapse (NC) observed in well-trained deep classification models, we propose FedDemux, a novel pFL framework that facilitates the personalization process by explicitly promoting the emergence of NC. FedDemux tackles feature entanglement through the coordination of two key modules: (1) a simplex learnable embedding (SLE) module guided by NC to learn and rectify local features, and (2) a knowledge decoupling module (KDM) that extracts general knowledge to align local features with the global simplex-learnable embeddings, while personalized knowledge further enhances local inference capabilities. We conduct extensive experiments on three real-world datasets with heterogeneous settings, where FedDemux consistently outperforms state-of-the-art methods in all cases. Specifically, FedDemux achieves up to a 13.54% accuracy improvement over baselines on the CIFAR-100 dataset. Scalability and ablation experiments validate FedDemux's effectiveness and SLE's role in accelerating convergence.
MUSE: A Multi-slice Joint Analysis Method for Spatial Transcriptomics Experiments
Recent advances in spatial transcriptomics (ST) and cost reductions have enabled large-scale multi-slice ST data generation, enhancing the statistical power to detect subtle biological signals. However, cross-slice inconsistencies and data quality variability present significant analytical challenges. To overcome these limitations, we developed MUSE, a computational framework designed for multi-slice joint embedding, spatial domain identification, and gene expression imputation. Specifically, MUSE integrates a two-module architecture to ensure robust cross-slice alignment and data harmonization. The alignment module models each slice as a graph and employs optimal transport to align cells across slices while preserving spatial continuity. The optimization module further refines integration by incorporating an alignment loss, allowing lower-quality data to leverage structural information from higher-quality slices. Additionally, MUSE generates virtual neighbors from aligned cells, enriching contextual information and mitigating data sparsity. These design principles enable seamless integration with existing single-slice methods, extending their applicability to multi-slice ST analysis. To comprehensively evaluate its performance, we applied MUSE to 12 real and 48 simulated datasets spanning a range of data qualities. Across all metrics, MUSE consistently outperformed existing methods in cross-slice consistency, spatial domain identification, and gene expression imputation. To promote accessibility and adoption, we provide MUSE as an open-source software package. As multi-slice ST datasets become increasingly prevalent, MUSE provides a robust and extensible framework designed to effectively integrate growing numbers of slices, thereby advancing the analysis of tissue architectures and spatial gene expression in complex biological systems.
Hearable Image: On-Device Image-Driven Sound Effect Generation for Hearing What You See
There have been various studies in audio generation from image, text, or video. However, the existing approaches have not consider on-device environment because audio generation models are computationally expensive and require heavy storage capacity to save large number of weights. In addition, it is difficult to get stable generation outputs because unexpected results may occur depending on various model inputs. In image-to-audio generation, there are diverse images in smartphones, and too many visual contexts are contained in image features. Therefore, it is sometimes unpredictable which audio categories are generated from images. In this paper, we propose a robust on-device sound effect generation framework that is image-to-audio generation based on latent diffusion. First, to avoid unstable and unpredictable audio generation results, we propose a stable sound generation framework with Audio Feature Dictionary and Audio-Image Matching Pipeline to generate sound effects from predefined sound effect categories. If an image matches to sound effect categories, proposed framework directly generate sound effects from audio features corresponding to the matched categories. Second, we propose Multi-Category Generation and Generation Flow Map to generate robust and diverse sound effects depending on audio categories. Using global and local features of an image, we can select multiple categories of sound effects. Third, the framework can be implemented in smartphone devices because we train the proposed model with low computational cost and small number of model weights under 4-step latent diffusion inference. Various experiments show the proposed framework solves on-device sound generation problem with maintaining generation quality and audio-image matching performances compared to large scale models. Our demo is available at: https://youtu.be/Y5HTr8wwqOA.
LLM-Enhanced Generalized Category Discovery via Iterative Graph Diffusion
Generalized Category Discovery (GCD) aims to identify both known and unknown categories from unlabeled data using limited labeled samples from known classes. The key challenges lie in the scarcity of supervision signals for unknown categories and the difficulty of modeling relationships among samples. Existing methods that rely solely on clustering uncertainty often result in imperfect hard negative selection, while their use of nearest-neighbor structures hinders the effective utilization of Large Language Model (LLM) annotations for obtaining high-quality supervision. We propose a dynamic optimization model (LIGD) that leverages diffusion graphs and LLM annotations to address these issues. By leveraging the semantic-correlation graph, the method achieves two key capabilities: it selects both hard negatives and unlabeled central samples likely to represent novel categories as high-value samples. In addition, the graph enables effective label propagation through its connected subgraph, significantly reducing computational costs while enhancing the accuracy of category discovery. To further enhance annotation quality, we introduce a two-stage prompting strategy that queries the LLM twice to accurately assign selected samples to either existing or novel categories. The entire process will be repeated iteratively until convergence to update the graph structure and node representations in the graph. Experiments on three GCD datasets demonstrate the significant superiority of LIGD. Most notably, in the challenging scenario where only 25% of categories are labeled, the model achieves substantial improvements while reducing the number of LLM queries by 50%. Code and data are available at https://github.com/wdmmxlbt/LIGD.
Enhancing Multi-Behavior Sequential Recommenders with Behavior-Aware Regularization
In the realm of multi-behavior sequential recommendation (MBSR), the complexity and heterogeneity of user interactions pose substantial challenges for sequence modeling. Existing studies involve significant efforts in combining different modules to learn more expressive multi-behavior sequence representations or designing strategies to extract user preferences related to the target behavior. Despite their effectiveness, these methods neglect a thorough analysis of how behavioral information shapes the probability distribution for next-item prediction, which is crucial for accurately modeling user preferences. To this end, we first analyze the learning distribution of MBSR, shedding light on the significance of target behavior in next-item prediction. Building upon this insight, we propose a Behavior-Aware Regularization approach for multi-behavior sequential Rec ommendation (BAR4Rec), where we introduce a regularization loss function to preserve the intrinsic constraints of target behavior. In this way, the target probability distribution is extracted from the whole distribution and naturally evolves into a more compatible and tractable form, thus facilitating model design and training. We evaluate the proposed method on three real-world datasets, and the results validate the efficacy of our approach.
MMFair: Fair Learning via Min-Min Optimization
Ensuring group fairness is crucial in applications like facial recognition, medical image analysis, and online comment toxicity classification. A key challenge to achieving group fairness arises from spurious correlations in datasets, where features used by models for predictions are unrelated to the true labels. The widespread use of large-scale pre-trained models as feature extractors, followed by fine-tuning for downstream tasks, can exacerbate this issue. In particular, improper fine-tuning on limited data often leads to overfitting, reinforcing spurious correlations and further undermining group fairness. To address this, we propose MMFair, an algorithm that optimizes perturbations through a min-min optimization approach. These perturbations are applied to the deep embeddings, preventing the model from associating irrelevant features with true labels, thus improving group fairness. Notably, since simple linear classifiers are prone to spurious correlations, we use a linear head in the initial stage to generate perturbations. After optimizing the perturbations in the latent space, we incorporate them into the original embeddings and then train a multi-layer perceptron (MLP) as the final classification head. This two-stage approach helps mitigate the bias problem of linear head, while leveraging the more powerful feature-learning capabilities of MLPs, leading to more stable and accurate classification results. We evaluate MMFair on the Waterbirds, CelebA, and ISIC datasets. The results show that MMFair improves the accuracy of the worst-performing group efficiently.
Higher-Order Information Matters: A Representation Learning Approach for Social Bot Detection
Detecting social bots is crucial for mitigating the spread of misinformation and preserving online conversation authenticity. State-of-the-art solutions typically leverage graph neural networks (GNNs) to model user representations from social relationships and metadata. However, these approaches overlook two key factors: the similarity of a user and her neighbors, as well as the coordinated behaviors of social bots, resulting in a suboptimal detection performance. To address these issues, we propose HyperScan, a novel representation learning method for social bot detection. Specifically, we introduce three effective learners to capture pair-wise, hop-wise, and group-wise relations. HyperScan learns pair-wise user representations based on social relations and user features. It then enhances user representations by building hop-wise interactions across the learned pair-wise user representations for capturing the structure-level proximity information. Subsequently, it models user representations by constructing higher-order (group-wise) relations derived from user profiles, tweets, and social relations to capture the feature-level proximity knowledge. By leveraging hop-wise interactions and higher-order relations, HyperScan significantly improves bot detection performance. Our extensive experiments demonstrate that HyperScan outperforms state-of-the-art methods on three benchmark datasets. Additional studies validate the robustness and effectiveness of each component of HyperScan.
DT-FedSDC: A Dual-Target Federated Framework with Semantic Enhancement and Disentangled Contrastive Learning for Cross-Domain Recommendation
Federated cross-domain recommendation aims to alleviate the problem of data sparsity and enable collaborative modeling of user behavior data from different platforms or institutions while ensuring data privacy. Most existing federated cross-domain recommendation methods rely on item IDs for modeling, ignoring the mining and utilization of item semantic information. In addition, due to the heterogeneity of data between different domains, the model is prone to domain bias and feature coupling problems during the aggregation process, which negatively impacts the recommendation performance. This paper proposes a dual-target federated cross-domain recommendation framework with semantic enhancement and disentangled contrastive learning. First, to utilize semantic information of items, item IDs features and text semantic features are jointly fused to enhance the item embedding representations. Second, we propose a user representation decoupling mechanism to explicitly decouple users preferences into shared and domain-specific preferences, thereby alleviating domain bias and feature coupling problems. Furthermore, we design a cross-domain contrastive learning module on the server side to enhance the consistency and transferability of shared representations between user representations across different domains. Experimental results show that the proposed algorithm performs significantly better than existing optimal methods on multiple real-world datasets, demonstrating its excellent performance in federated cross-domain recommendations.
Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting
Recent advances in deep learning have significantly boosted performance in multivariate time series forecasting (MTSF). While many existing approaches focus on capturing inter-variable (a.k.a. channel-wise) correlations to improve prediction accuracy, the temporal dimension, particularly its rich structural and contextual information, remains underexplored. In this paper, we propose BIM3, a novel framework that integrates BIdirectional temporal-aware modeling with Multi-Scale Mixture-of-Experts for MTSF. First, unlike existing methods that treat historical and future temporal information independently, we introduce a novel Timestamp Dual Cross-Attention Module, which employs a symmetric cross-attention mechanism to explicitly capture bidirectional temporal dependencies through timestamp interactions. Second, to address the complex and scale-varying temporal patterns commonly found in multivariate time series, we move beyond recent multi-scale forecasting models that share parameters across all channels and fail to capture channel-specific dynamics. Instead, we design a Multi-Scale Feature Extract Mixture-of-Experts module that adaptively routes time series to specialized experts based on their temporal characteristics. Extensive experiments on multiple real-world datasets show that BIM3 consistently outperforms state-of-the-art methods, highlighting its effectiveness in capturing both temporal structure and inter-variable diversity.
LangPTune: Optimizing Language-based User Profiles for Recommendation
Recent works have shown increasing interest in using natural language-based user profiles for recommender systems, as they offer greater transparency and interpretability compared to traditional embedding-based methods. Most existing approaches rely on zero-shot inference with large language models (LLMs) to generate these profiles, but the resulting quality remains insufficient, leading to suboptimal recommendation performance. In this paper, we present LangPTune, the first end-to-end training framework designed to directly optimize LLM-generated user profiles for recommendation tasks. By explicitly training the LLM for the recommendation objective, our approach significantly outperforms zero-shot baselines. Evaluations across training setups and benchmarks show that LangPTune not only exceeds the performance of zero-shot methods but also matches the performance of state-of-the-art embedding-based baselines. Additionally, we assess whether our training framework maintains the interpretability of user profiles, using both GPT-4 simulations and crowdworker studies.
High-Order Moments Conditional Domain Adaptation Networks for Wearable Human Activity Recognition
Developing scalable wearable human activity recognition (wHAR) models is challenging due to domain shifts that substantially degrade performance across downstream tasks. Unsupervised domain adaptation (UDA) seeks to improve generalization by transferring knowledge from labeled source domains to unlabeled target domains. However, conventional UDA methods primarily align marginal feature distributions while neglecting feature-label dependencies, often leading to negative transfer and sub-optimal performance. Motivated by these limitations, we propose a novel optimization framework that tackles two key challenges: (i) generating reliable pseudo-labels for the unlabeled target domain and (ii) minimizing conditional discrepancies across domains. To address (i), we employ temperature-based entropy minimization (TEM), which calibrates prediction confidence by scaling logits with a temperature parameter to produce robust pseudo-labels. For (ii), we introduce a polynomial kernel-based cross-covariance (PkCC) loss, a high-order statistics-driven approach that maps features into a reproducing kernel hilbert space (RKHS) to capture richer feature-label dependencies and reduce conditional distribution gaps between domains. In addition, we demonstrate that CoDAN readily extends to partial UDA (pUDA), where the target label space is a subset of the source, and extensive evaluations on public wHAR datasets with diverse label spaces validate its superior performance over state-of-the-art methods in both UDA and pUDA scenarios.
PathLens: Structurally Enhancing Heterophilic Graphs for GNNs
The notion that standard GNNs perform better on graphs with high homophily, led to the development of specialised algorithms for heterophilic datasets in recent years. In this work, we both examine and leverage this notion. Rather than creating new algorithms, we emphasise the importance of understanding and enriching the data. We introduce a novel data engineering technique, PathLens, that enhances the performance of both heterophilic and non-heterophilic GNNs on heterophilic datasets. Our method structurally augments a given heterophilic graph by adding supernodes, thereby creating a network of pathways connecting spectral clusters in the graph. It facilitates additional paths to bring similar nodes (intraclass) closer than dissimilar ones (interclass) by reducing the average shortest path lengths. We draw both intuitive and empirical connections between the relative decreases in intraclass and interclass average shortest path lengths and shifts in the graph's homophily levels, providing a novel perspective that extends beyond traditional homophily measures. We conduct extensive experiments on seven diverse heterophilic datasets using various GNN architectures and also compare with several data-centric techniques, demonstrating significant improvements as high as 37% in node classification performance. Furthermore, our empirical findings highlight the strong sensitivity of several recent GNNs with respect to the random seed used for data splitting, underscoring this often-overlooked factor in GNN evaluation. The code will be available at https://github.com/goyalkaraniit/PathLens.
PERC: A Prior-Guided Framework for Classifying Long-Content Educational Resources with Imbalanced Category Distributions
With the rapid growth of online education, the types and volumes of educational resources have increased significantly. Efficient classification of these resources can substantially reduce manual workload and enhance management effectiveness. However, existing models often struggle to accurately classify long-content educational content, particularly under imbalanced category distributions. To address these challenges, we propose PERC, a prior-guided framework for classifying long educational resources with imbalanced categories. To the best of our knowledge, PERC is the first framework to incorporate the foundational cognitive dimensions of Bloom's Taxonomy into educational resource classification. First, PERC leverages standardized pedagogical classification guidelines and maps the original label space into a semantically structured prior category space using a Structured State-Space Learning framework. Second, to handle the length and high information density of educational texts, we introduce a Dynamic Sliding Window Attention mechanism that captures both local and partial global dependencies, enabling the extraction of compact, semantically rich representations. Finally, a category-aware classifier integrates the prior representation of each category with the semantic representation of the resource to produce a category-aware embedding for final prediction. To evaluate PERC, we constructed two datasets: EduMix-24 and EduMath-24, comprising 18,799 educational resources manually annotated across 9 lesson types, 15 teaching modes, and 9 activity elements. Classification experiments on all three tasks consistently demonstrate that PERC outperforms state-of-the-art baselines.
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack (i.e., backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models (e.g., ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning these models. To address these challenges, we establish new standards for an effective and feasible backdoor attack in the context of large pre-trained models. In line with these standards, we introduce our EDT model, an Efficient, Data-free, Training-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models, which replaces the embedding of the poisoned image with the target image without poisoning the training dataset or training the victim model. Our experiments, conducted across various pre-trained models such as ViT, CLIP, BLIP, and stable diffusion, and on downstream tasks including image classification, image captioning, and image generation, demonstrate the effectiveness of our method. Our code is available at https://github.com/donglgcn/Editing/
Twin-Flow Generative Ranking Network for Recommendation
Deep Learning Recommendation Models (DLRMs) often rely on extensive manual feature engineering to improve accuracy and user experience, which increases system complexity and limits scalability of model performance with respect to computational resources. Recently, Meta introduced a generative ranking paradigm based on HSTU block that enables end-to-end learning from raw user behavior sequences and demonstrates scaling law on large datasets that can be regarded as the state-of-the-art (SOTA). However, splitting user behaviors into interleaved item and action information significantly increases the input sequence length, which adversely affects both training and inference efficiency. To address this issue, we propose the Twin-Flow Generative Ranking Network (TFGR), that employs a Twin-flow mechanism to optimize interaction modeling, ensuring efficient training and inference through end-to-end token processing. TFGR duplicates the original user behavior sequence into a real flow and a fake flow based on the authenticity of the action information, and then defines a novel interaction method between the real flow and the fake flow within the QKV module of the self-attention mechanism. This design reduces computational overhead and improves both training efficiency and inference performance compared to Meta's HSTU-based model. Experiments on both open-source and real industrial datasets show that TFGR outperforms DLRM, which serves as the industrial online baseline with extensive feature engineering, as well as Meta's HSTU and other common recommendation models such as DIN, DCN, DIEN, and DeepFM. Furthermore, we investigate optimal parameter allocation strategies under computational constraints, establishing TFGR as an efficient and effective next-generation generative ranking paradigm.
When Variety Seeking Meets Multi-Sided Recommendation Fairness: A Consistent and Personalized Multi-Objective Optimization Framework
Recommendation research has evolved from solely improving accuracy to addressing ethical and fairness concerns. While prior works focus on optimizing fairness from either the user or product perspective, recent research emphasizes the importance of multi-sided fairness. This issue is inherently challenging due to the competing goals of different stakeholders. To tackle this challenge, we propose a Consistent and Personalized Fairness Recommendation framework with Multi-Objective Integer Programming (CPFR-MOIP). Our framework introduces two key innovations. First, we develop a novel similarity-based individual fairness metric for the user side and formulate a consistent product-side fairness metric, ensuring that the generated recommendation list aligns with the user preference distribution and the expected product exposure distribution. Second, we incorporate users' variety-seeking levels as a moderating factor to adjust fairness trade-offs and introduce personalized weights to balance user-side and product-side fairness. To effectively solve this optimization problem, we devise an alternating algorithm with theoretical guarantee and demonstrate the Pareto optimality of the obtained solutions. Extensive experiments on two real-world datasets demonstrate that our CPFR-MOIP achieves superior multi-sided fairness while maintaining competitive recommendation accuracy. Furthermore, ablation analysis highlights the advantages of incorporating user variety-seeking levels for personalizing fairness trade-offs. Our work paves the way for more ethical and personalized recommendation systems. The implementation code is available at: https://github.com/P0ise-Wang/CPFR-MOIP.
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world Chemical Reactions
Reaction yield prediction underpins computer-aided synthesis prediction (CASP). Formulated as a regression problem that takes both reactants and products as input, this task has been extensively studied using machine learning methods, based on handcrafted fingerprint features, SMILES encoded by Transformers, and molecular graphs encoded by Graph Neural Networks. However, a major limitation of these methods is their inability to effectively capture and model the underlying uncertainties, arising both from the inherently stochastic nature of chemical reaction processes and from inconsistencies or noise in how yields are measured and reported. What makes this seemingly simple regression problem even more challenging is the lack of any principled way to account for the underlying uncertainties, due to missing or unrecorded experimental process (commonly happens in chemical labs). Given these challenges, we propose a new formulation for yield prediction. Rather than assuming a single deterministic yield value for a given reaction, we model the outcome as a probabilistic distribution over three discrete yield regimes: high, medium, and low, reflecting the inherent uncertainty in the reaction process, which is often only partially observed. Accordingly, we propose Proto-Yield, an encoder-agnostic prototype network that models reactions as occurring in one of three yield regimes: high, medium, or low. Without access to full reaction processes, Proto-Yield learns to infer latent regimes and their associated yield distributions from noisy, incomplete training data. During inference, Proto-Yield outputs both a calibrated probability distribution over the yield regimes and the predicted yield conditioned on each regime. Extensive experiments on a 41,000-reaction patent corpus and two high-throughput benchmarks show that Proto-Yield improves R2 by up to 15% and reduces RMSE/MAE by 13% compared to baseline methods.
KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation
Digital maps play a crucial role in various applications such as navigation, fleet management, and ride-sharing, necessitating their accuracy and currency, which require timely updates. While the majority of geospatial databases (GDBs) provide high-quality information, their data is (i) limited to specific regions and/or (ii) missing some entities, even in their covered areas. Map conflation is the process of augmentation of a GDB using another GDB to conflate missing spatial features. Existing map conflation methods suffer from two main limitations: (1) They are designed for the conflation of linear objects (e.g., road networks) and cannot simply be extended to non-linear objects, thus missing information about most entities in the map. (2) They are heuristic algorithmic approaches that are based on pre-defined rules, unable to learn entities matching in a data-driven manner. To address these limitations, we design KRAFT, a learning based approach consisting of three parts: (1) Knowledge Graph Construction - where each GDB is represented by a knowledge graph, (2) Map Matching - where we use a knowledge graph alignment method as well as a geospatial feature encoder to match entities in obtained knowledge graphs, and (3) Map Merging - where we merge matched entities in the previous modules in a consistent manner, using a mixed integer linear programming formulation that fully merges the GDBs without adding any inconsistencies. Our experimental evaluation shows that not only does KRAFT achieve outstanding performance compared to state-of-the-art and baseline methods in map conflation tasks, but each of its modules (e.g., Map Matching and Map Merging) also separately outperforms traditional matching and merging methods.
Addressing the Distortion of Community Representations in Anomaly Detection on Attributed Networks
Anomaly detection on attributed networks, especially in the unsupervised scenario, has garnered significant attention. And the Contrastive Learning (CL)-based methods have emerged as one of the state-of-the-art approaches for this task. However, existing CL-based methods face a critical challenge: anomalous nodes infiltrate the sampled local communities, leading to the distortion of community representation which fundamentally limits the discriminative ability. Our theoretical analysis reveals that this distortion is caused by two main mechanisms: the cross contamination and the aggregation bias. And the key oversight is to treat all community members equally and ignore the relative reliability of nodes. To address these issue, we propose a CL-based ANomaly detectIon Method on Attributed networks targeted at mitigating community distortions to enhance anomaly discrimination (ANIMA for short), which incorporates a Truncation-Restriction community encoder (TRC-Encoder) with an elaborate heuristic prior instruction to detect and suppress anomalous contributions during community representation learning. Comprehensive experiments on 7 datasets demonstrate that ANIMA outperforms 10 SOTA methods by 2.25-8.8% AUC, validating the effectiveness of our approach in mitigating community distortions and enhancing anomaly discrimination.
EEG-FSL: An EEG-Based Few-Shot Learning Framework for Music Recommendation
Brain-computer interface based on electroencephalogram (EEG) has demonstrated significant potential for capturing users' implicit preferences, offering an innovative technique for music recommendation. However, we face two key challenges: (1) ineffective distinction of complex neural patterns in EEG signals, and (2) the cold-start problem, due to limited user EEG samples. To address these issues, we present EEG-FSL, a novel framework that integrates model-agnostic meta-learning (MAML) with dual-path neural feature extraction for music recommendation. EEG-FSL applies an attention-enhanced EEG encoder to extract meaningful patterns from brain signals through complementary pathways: one pathway retains temporal and phase information, while the other focuses on extracting common frequency-domain features. Furthermore, we utilize contrastive learning to explore the intrinsic structure of the data, significantly improving the model's feature differentiation ability. Additionally, we propose a meta-learning method which allows EEG-FSL to quickly adapt to new users using only a small number of EEG samples, effectively solving cold-start problem. Extensive experiments are conducted on a real-world dataset demonstrate the effectiveness of the proposed method. Specially, in few-shot scenarios, compared to the best baseline, our approach improves mean squared error in score prediction by 8.4% and classification accuracy by 16.8%. Consequently, our work provides a practical solution for next-generation brain-computer interface applications, capable of delivering highly personalized content recommendations while minimizing user data collection requirements. Our code is available at https://anonymous.4open.science/r/EEG-FSL-code-72F3/.
Improving the Safety of Medication Recommendation via Graph Augmented Patient Similarity Network
Recommending optimal medication combinations for patients is a crucial application of artificial intelligence in healthcare. Recent works typically use patients' electronic health record combined with their current health conditions. However, these efforts have the following issues: 1) they often reference historical visits unrelated to the current situation, and 2) there is a latent risk of side effects from historical prescriptions. Such issues raise concerns about the safety of medication recommendation. To address this, we propose GPSRec, a novel Graph augmented Patient Similarity network for medication Recommendation. By leveraging dual similarity measures to selectively integrate historical visits, GPSRec effectively filters out irrelevant information, improving the accuracy of recommendation. We further present a training strategy, which combines a pre-training method and a dual threshold loss adjustment, reduces the risk of adverse drug-drug interactions, enhancing the safety of recommendation. Extensive experiment results on two real datasets demonstrate that GPSRec significantly outperforms state-of-the-art methods. Notably, it achieves 30.11% and 24.92% improvements in safety, respectively, with higher accuracy.
LinkGPT: Leveraging Large Language Models for Enhanced Link Prediction in Text-Attributed Graphs
Inspired by the success of Large Language Models (LLMs) in language and vision tasks, there has been growing interest in applying LLMs to graph tasks, particularly on Text-Attributed Graphs (TAGs). However, most prior work tackles the node classification task. In this work, we evaluate an LLM's ability to reason over structured data and infer new facts based on learned patterns by focusing on link prediction (LP)-the task of predicting missing links between nodes-that is understudied in the literature. This task poses two key challenges: (1) How to effectively integrate pairwise structural information, which is crucial for LP performance, into LLMs, and (2) how to address the computational bottleneck during inference. To tackle these challenges, we propose LinkGPT, the first LLM-based training and inference framework specifically designed for LP on homogeneous TAGs. To enhance the LLM's ability to understand the underlying structure, we carefully design a node encoder and pairwise encoder, and leverage a two-stage instruction tuning to effectively incorporate the nodewise and pairwise information into LLMs. For inference efficiency, we introduce a retrieval-reranking scheme. Extensive experiments show that LinkGPT achieves state-of-the-art performance on real-world graphs and demonstrates superior zero-shot and few-shot generalization. At inference time, it achieves a 10× speedup while maintaining high LP accuracy.
FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection
Multi-step or hybrid deepfakes, created by sequentially applying different deepfake creation methods such as Face-Swapping, GAN-based generation, and Diffusion methods, can pose an emerging and unforseen technical challenge for detection models trained on single-step forgeries. While prior studies have mainly focused on detecting isolated single manipulation, little is known about the detection model behavior under such compositional, hybrid, and complex manipulation pipelines. In this work, we introduce FakeChain, a large-scale benchmark comprising 1-, 2-, and 3-Step forgeries synthesized using five state-of-the-art representative generators. Using this approach, we analyze detection performance and spectral properties across hybrid manipulation at different step, along with varying generator combinations and quality settings. Surprisingly, our findings reveal that detection performance highly depends on the final manipulation type, with F1-score dropping by up to 58.83% when it differs from training distribution. This clearly demonstrates that detectors rely on last-stage artifacts rather than cumulative manipulation traces, limiting generalization. Such findings highlight the need for detection models to explicitly consider manipulation history and sequences. Our results highlight the importance of benchmarks such as FakeChain, reflecting growing synthesis complexity and diversity in real-world scenarios. Our sample code is available here. https://github.com/minjihh/FakeChain.
LeadFairRec: LLM-enhanced Discriminative Counterfactual Debiasing for Two-sided Fairness in Recommendation
Fairness-aware recommendation has emerged as a pivotal research area in recent years. Current fairness studies primarily examine two independent dimensions: user-side fairness and item-side fairness. However, most approaches address each side's fairness in isolation while neglecting their complex interdependencies. In this paper, we propose an LLM-Enhanced DiscriminAtive Counterfactual Debiasing Model for Two-sided Fairness in Recommendation (LeadFairRec). Specifically, we first design a two-sided causal graph that jointly models provider-customer fairness interactions through their causal relationships. Then we propose a discriminative counterfactual debiasing method, which effectively removes spurious correlations while maintaining true user-item interactions. Finally, we propose an LLM-enhanced counterfactual inference method to derive noise-resistant user/item representations from interaction data, enhancing the robustness of causal debiasing. The experimental results demonstrate the high effectiveness of our proposed model. We provide our code at https://github.com/houyimin660/LeadFairRec.
Model-Agonistic Iterative Graph Diversification for Improving Learning to Solve Graph Optimization Problems
A recent line of research on learning to solve graph optimization problems has attracted much research attention. However, most recent machine learning approaches to graph optimization problems usually employ graph generators to randomly generate training graphs, which may lead to overfitting and deteriorate the model's generalization. To tackle this issue, we observe that enhancing the diversity of training graphs is a crucial factor in improving the model's performance. Therefore, in this paper, we formulate a new research problem, named Graph Augmentation for Diversity Maximization (GRAM), to maximize the training graph diversity by performing graph modifications. We first analyze the NP-hardness of GRAM. We then propose a 2-approximation algorithm and formally analyze its performance guarantee. Experimental results on well-known graph optimization problems show that our proposed approach significantly outperforms the baselines, such as graph augmentation and deep learning-based graph generation approaches.
GRIT: An Accurate and Efficient Graph Stream Summarization for Temporal Query
Graph stream summarization refers to the technique used to process graph streams-unbounded sequences of edges-by constructing compressed representations that support approximate queries on both graph topology and temporal information in computing power networks. However, existing methods struggle to achieve accurate and efficient temporal queries due to two key limitations: (1) inefficient integration of temporal information, leading to high latency in both edge processing and query execution; and (2) redundant multilayer structures that accumulate errors, significantly reducing query accuracy. In this paper, we propose GRIT, an accurate and efficient Graph stReam summarIzation for Temporal query. GRIT introduces a new structure FlatIndex, which organizes temporal information in a flattened form, playing a critical role in minimizing error accumulation and ensuring accurate temporal queries. To further enhance edge processing efficiency, we introduce a lazy update strategy, which updates only a single element in the FlatIndex upon edge insertion, significantly reducing insertion latency. Moreover, our greedy-based decomposition (GBD) algorithm decomposes the target query range into the minimal number of intervals corresponding to the FlatIndex, enabling efficient execution of temporal queries over arbitrary time ranges. Extensive experiments on five real-world datasets demonstrate that GRIT improves query accuracy by 2-3 orders of magnitude, while reducing query latency by 1-2 orders of magnitude and increasing throughput by 7-13 times compared to state-of-the-art methods.
Distributed Computation of k-Vertex Connected Components in Large Scale Networks
Recently, k-vertex connected component (k-VCC) detection has gained significant attention in graph analysis owing to its ability to capture structural cohesion. A k-VCC remains connected even after the removal of any k-1 vertices from itself. The k-VCC has broad applications across multiple domains, such as social network analysis, cybersecurity, and bioinformatics. Yet, the existing exact k-VCC detection algorithms require repeated computation of minimum vertex cuts, imposing a prohibitive computational cost for large scale graphs. In this paper, we present an approximate algorithm for k-VCC detection that leverages Monte Carlo sampling to accelerate minimum vertex cut computation with theoretical guarantee. Further, we design a distributed algorithm for mining all k-VCCs, named DkVCC. DkVCC adopts a divide-and-conquer strategy, decomposing the problem into smaller subgraph mining tasks that can be executed concurrently. Specifically, we generate tasks from individual vertices to construct initial subgraphs, and then iteratively expand and merge the subgraphs to form the final k-VCCs. Extensive experiments on 5 large real datasets demonstrates the efficiency of our proposed algorithms. For example, we achieve 4× runtime speedup on the LiveJournal dataset with 3.99M vertices and 34.7M edges in a 3-node cluster.
Mixture of Semantic and Spatial Experts for Explainable Traffic Prediction
To satisfy the growing demand for traffic prediction induced by urbanization, the intelligent transportation system integrated various cutting-edge artificial intelligence technologies, with large language models (LLMs) as a representative, has been developed. However, existing methods are mostly confined by shallow LLMs utilization, where the semantic capacity of LLMs is ignored and the traffic data are directly fed in. Furthermore, the modality diversity of different traffic prediction scenarios (e.g., flow, speed, and demanding) remains to be underexplored, which restricts the model flexibility towards downstream applications. To mitigate these limitations, we propose a Mixture of Semantic and Spatial Experts (SS-MoE) for traffic prediction along with the human-intelligible post-hoc result explanation. Specifically, to enlighten the traffic predictor with abundant semantic information, we design hierarchically coarse- and fine-grained prompts including role assignments, dataset descriptions, and background supplements, which serves as the auxiliary knowledge for downstream prediction. Afterwards, considering the diversity of real-world traffic scenarios, we construct the MoE framework consisting of a spatial expert, a semantic expert, and a general expert, which accounts for the node-level features, the semantic representations, and the overall generalization, respectively. At last, we instruct the LLM to explain and analyze the final prediction, which is able to provide insightful conclusions and support intelligent transportation decisions, forming a unified prediction-explanation pipeline. Extensive experiments on five public traffic datasets demonstrate the superiority of SS-MoE across three traffic prediction tasks. Experimental results indicate that the MAE and RMSE values of SS-MoE are reduced by up to 4.04% and 3.20% compared with that of the runner-up, respectively.
Revisiting the Inner Product Method: Optimizing Sparse Matrix Multiplication via Set Intersection
Sparse matrices are extensively used to model interactions between entities and facilitate computations in neural networks. Sparse Matrix Multiplication (SpGEMM) serves as a fundamental operation in graph algorithms, social network analysis, and deep learning, attracting considerable research interest. Among the four primary paradigms for defining sparse matrix multiplication, the Inner Product (IP) method most closely aligns with the standard definition of matrix multiplication. However, due to its limited data reuse and reliance on index matching, the IP method has been rarely explored in the literature. This paper investigates the strong connection between SpGEMM and set intersection computation, introducing a hybrid sparse matrix multiplication algorithm that builds upon the numerical computation of the IP method. By leveraging the IP method's advantages-such as minimal intermediate results and high flexibility-our approach effectively enhances computational efficiency. Experimental evaluations on benchmark datasets demonstrate the superiority of the proposed algorithm, particularly in scenarios where the resulting matrix exhibits high sparsity. Furthermore, our method proves effective in several applications, including self-transpose multiplication and sparse matrix multiplications in graph neural networks.
CS-Agent: LLM-based Community Search via Dual-agent Collaboration
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, yet their application to graph structure analysis, particularly in community search, remains underexplored. Community search, a fundamental task in graph analysis, aims to identify groups of nodes with dense interconnections, which is crucial for understanding the macroscopic structure of graphs. In this paper, we propose GraphCS, a comprehensive benchmark designed to evaluate the performance of LLMs in community search tasks. Our experiments reveal that while LLMs exhibit preliminary potential, they frequently fail to return meaningful results and suffer from output bias. To address these limitations, we introduce CS-Agent, a dual-agent collaborative framework to enhance LLM-based community search. CS-Agent leverages the complementary strengths of two LLMs acting as Solver and Validator. Through iterative feedback and refinement, CS-Agent dynamically refines initial results without fine-tuning or additional training. After the multi-round dialogue, Decider module selects the optimal community. Extensive experiments demonstrate that CS-Agent significantly improves the quality and stability of identified communities compared to baseline methods. To our knowledge, this is the first work to apply LLMs to community search, bridging the gap between LLMs and graph analysis while providing a robust and adaptive solution for real-world applications.
Geometric Heterogeneous Graph Neural Network for Protein-Ligand Binding Affinity Prediction
Accurately predicting protein-ligand binding affinity (PLA) remains a critical challenge in structure-based drug discovery. Recent advances have focused on using geometry-aware graph neural networks to model the three-dimensional (3D) structure of protein-ligand complexes for PLA prediction. However, they still achieve suboptimal performance due to two potential issues. 1) Representing the protein-ligand complex as a homogeneous graph ignores the inherent difference between intra- and intermolecular interactions, limiting the expressive ability of the models. 2) Given that the geometric complementarity between the ligand and protein binding pocket serves as a fundamental determinant of binding strength, the incomplete exploitation of geometric information constrains the predictive performance. In this study, we propose a novel Geometric Heterogeneous Graph Neural Network (GeoHGN) for PLA prediction. Specifically, we consider complete geometries to characterize the directions of edges in coordinate space through quantum inspired basis functions. To sufficiently incorporate the 3D information and heterogeneous topology of the complexes, we elaborately design a novel heterogeneous directional message passing mechanism (HDMP), which enables the propagation and aggregation of messages from intra- and intermolecular neighbors along with the directional information of linked edges. Extensive benchmarking experiments demonstrate the superiority of GeoHGN in predicting PLA.
Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but are constrained by high computational costs due to iterative multi-step inference. While Neural Architecture Search (NAS) can optimize DMs, existing methods are hindered by retraining requirements, exponential search complexity from step-wise optimization, and slow evaluation relying on massive image generation. To address these challenges, we propose Flexiffusion, a training-free NAS framework that jointly optimizes generation schedules and model architectures without modifying pre-trained parameters. Our key insight is to decompose the generation process into flexible segments of equal length, where each segment dynamically combines three step types: full (complete computation), partial (cache-reused computation), and null (skipped computation). This segment-wise search space reduces the candidate pool exponentially compared to step-wise NAS while preserving architectural diversity. Further, we introduce relative FID (rFID), a lightweight evaluation metric for NAS that measures divergence from a teacher model's outputs instead of ground truth, slashing evaluation time by over 90%. In practice, Flexiffusion achieves at least 2× acceleration across LDMs, Stable Diffusion, and DDPMs on ImageNet and MS-COCO, with FID degradation under 5%, outperforming prior NAS and caching methods. Notably, it attains 5.1× speedup on Stable Diffusion with near-identical CLIP scores. Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.
Enhancing Multimodal Entity Linking via Distillation and Multimodal Large Language Models
Multimodal entity linking (MEL) aims to link ambiguous multimodal mentions to their corresponding entities in a multimodal knowledge graph. Although many existing methods have been dedicated to exploring fine-grained intra- and cross-modal interactions between mentions and entities and have achieved good results, the discrepancies between the data distributions in training and real-world applications, as well as the noisy onehot labels, still impede the generalization of MEL models, which leads to poor performance when encountering unseen entities. Although general-purpose multimodal large language models (MLLMs) are powerful, it is costly and time-consuming to apply them directly to the MEL task. To address the above issues, we propose a Distillation-Enhanced framework for Multimodal Entity Linking (DEMEL). During training, DEMEL takes the best-trained MEL model so far as the teacher model, and distills the knowledge of the teacher model into the student model, i.e. the MEL model of the current iteration, when training it with onehot labels. This imposes regularization on the model, balances bias and variance in the training process, and improves the generalization ability of the MEL model. Moreover, DEMEL employs an MLLM to selectively rerank predictions for uncertain samples in the inference phase, improving accuracy while minimizing invocation costs. Extensive experiments on three public MEL datasets demonstrate that DEMEL outperforms state-of-the-art baselines, achieving 3.27% improvement with MLLM reranking for just 8.59% of test samples, and up to 4.8% H@1 enhancement in low-resource settings even without using MLLM reranking.
Hyperspherical Dynamic Multi-Prototype with Arguments Dependencies and Role Consistency for Event Argument Extraction
Event Argument Extraction (EAE) aims to identify arguments and assign them to predefined roles within a document. Existing methods face challenges in modeling intra-class variance and inter-class ambiguity, hindering accurate role assignment. Inspired by how humans dynamically adjust classification criteria while maintaining category consistency (e.g., distinguishing ''Victim'' and ''Attacker'' roles based on contextual relationships), we propose a HDMAR (Hyperspherical Dynamic Multi-Prototype with Arguments Dependencies and Role Consistency) method, where three innovations tackle these challenges: (1) Hyperspherical dynamic multi-prototype learning is used to capture intra-role diversity and enforce inter-role separation via hyperspherical optimization and optimal transport, (2) cross-event role consistency is used to align role representations across events, and (3) an arguments dependencies-guided encoding module enhances contextual understanding of intra-event and inter-event dependencies. Experiments on RAMS and WikiEvents demonstrate gains in accuracy, with further analysis validating the contributions of each module.
OFIA: An Object-centric Fine-grained Alignment Enhancement for Video-Text Retrieval
Text-video alignment is crucial for text-video retrieval. Empirical studies have suggested that coarse-grained alignment overlooks the rich cross-modal details and therefore performs worse than fine-grained alignment. In the current transfer learning paradigm, although researchers primarily utilize patch-level or frame-level embedding as the fine-grained video representation for alignment, the frame embeddings miss the informative visual information, while a single patch only captures limited local details. These defects hinder the potential improvement in video-text retrieval. To address the defects of the existing fine-grained alignment approach, this paper proposes the Object-centric Fine-grained Alignment Enhancement for Video-Text Retrieval, namely OFIA, which consists of a text-guided object-text alignment module and a similarity-wise frame aggregation module to enhance video-text alignment. The text-guided object-text alignment module leverages the textual descriptions to detect and extract more relevant objects, enabling more precise local similarities within video frames. However, not all frames contribute equally to effective alignments. The similarity-wise frame aggregation module assigns greater importance to informative frames, overcoming the challenge of insignificant ones and optimizing the similarity of matched video-text pairs. The empirical evaluation on benchmark datasets including MSRVTT, MSVD, DiDeMo, and ActivityNet demonstrate state-of-the-art performance of the proposed method.
Unlocking the Potential of Smaller Language Models as Superior Instruction Evolvers
Instruction tuning has become a cornerstone for unlocking the full potential of large language models. Among the key factors, complex and diverse instructions play a crucial role in aligning these models with a wide range of downstream tasks. However, current methodologies for constructing large-scale instruction datasets tend to favor powerful models, such as GPT-4, based on the empirical assumption that larger models inherently possess superior capabilities. In this study, we challenge this prevailing assumption and delve into the untapped potential of smaller language models (SLMs) in the context of instruction evolution. Through extensive experiments across three distinct scenarios of instruction evolution, we find that SLMs can generate more effective instructions compared to their larger counterparts. Further analysis reveals that SLMs exhibit a broader output space during instruction evolution, leading to the creation of more complex and diverse instructional variants. Additionally, we observe that existing evaluation metrics fall short in capturing the nuanced impact of instructions. To address this limitation, we propose Instruction Complex-Aware IFD (IC-IFD), an enhanced framework that incorporates instruction complexity into the original IFD score. This approach enables a more accurate assessment of the effectiveness of instruction data, paving the way for more refined instruction tuning strategies.
Scaling Trust: Veracity-Driven Defect Detection in Entity Search
Veracity is a critical dimension of data quality that directly impacts a wide range of tasks. In entity search scenarios, Knowledge Graphs (KGs) such as DBpedia and Wikidata serve as core resources for accessing factual content. The veracity of these KGs is therefore essential for ensuring the reliability and trustworthiness of retrieved entities -- factors that directly influence user confidence in the search system. However, ensuring the truthfulness of entities remains a major challenge due to the complexities associated with the scale, development, and maintenance of KGs. This paper critically analyzes the impact of veracity in entity search, using DBpedia as the underlying KG. To this end, we introduce eRank, a veracity-driven re-ranking strategy that enhances entities' trustworthiness without sacrificing the ranking's overall relevance. Furthermore, we propose the Active Learning-based verAcity-Driven Defect IdentificatioN (ALADDIN) system, a lightweight and scalable framework for veracity-driven defect detection. ALADDIN identifies incorrect KG facts and exhibits high effectiveness in downstream entity-centric tasks, such as entity summarization, entity card generation, and defect recommendation.
Deep Modality-Disentangled Prompt Tuning for Few-Shot Multimodal Sarcasm Detection
The growing use of multimodal content on social media has sparked interest in sarcasm detection for better opinion mining. However, current models depend heavily on large datasets, limiting their adaptability to real-world scenarios with limited labeled data. Therefore, an in-depth exploration of the problem in a few-shot setting is necessary. We propose DMDP (Deep Modality-Disentangled Prompt Tuning), a novel approach designed for few-shot multimodal sarcasm detection. Previous few-shot approaches relied on shallow, unified prompts across modalities, limiting their ability to model the nuanced and diverse nature of sarcasm. In contrast, we propose gated modality-weighted dual prompts that are disentangled across text and visual encoders, injected at deeper layers to enable hierarchical feature aggregation and identify distinct sarcasm cases. A prompt-sharing mechanism across the layers of each encoder facilitates the capture of low-level cues to high-level semantics, while a cross-modal prompt alignment facilitates subtle visual-textual interactions, allowing the model to capture complex sarcastic cues better. Extensive experiments on two public datasets show the superiority of our model over baselines in the few-shot and extremely low-resource scenarios. To further validate our model's effectiveness, we conduct a cross-dataset evaluation on two public datasets, where it consistently outperforms baselines, highlighting strong generalization. Our code will be available at https://github.com/mr-perplexed/dmdp.
Streamlining Feature Interactions via Selectively Crossing Vectors for Click-Through Rate prediction
Previous Click-Through Rate (CTR) prediction models rely on enumerating high-order feature combinations up to a fixed order, limiting expressiveness and scalability. Recent studies explored arbitrary-order interaction modeling through two major paradigms: log-based and graph-based methods. However, both paradigms suffer from inherent weaknesses: log-based methods lack stability, and graph-based methods lack generalizability, as both attempt to model overly diverse combinations of features, many of which may be noisy or redundant. This observation provokes a central question: What if only a small set of core interactions is sufficient? To explore this, we progressively mask feature interactions and find that removing up to 90% of them results in negligible performance degradation. This suggests that most interactions are unnecessary. Motivated by this finding, we propose SCV: Selectively Crossing Vectors, a CTR prediction framework that reformulates feature interaction learning as a sparse edge selection task over a globally shared feature-interaction graph. By modeling feature interactions over a globally learned graph and dynamically fusing expert outputs in an instance-aware manner, the SCV effectively leverages global consistency and local adaptability. We further introduce a label-biased self-distillation objective to mitigate the effects of noisy supervision and stabilize training. Experiments on public CTR benchmarks show that SCV achieves state-of-the-art performance while reducing computational cost by up to 66%, validating the effectiveness of globally sparse yet locally adaptive interaction modeling. All codes are available at: https://github.com/bw-99/scv.
Leveraging Vulnerabilities in Temporal Graph Neural Networks via Strategic High-Impact Assaults
Temporal Graph Neural Networks (TGNNs) have become indispensable for analyzing dynamic graphs in critical applications such as social networks, communication systems, and financial networks. However, the robustness of TGNNs against adversarial attacks, particularly sophisticated attacks that exploit the temporal dimension, remains a significant challenge. Existing attack methods for Spatio-Temporal Dynamic Graphs (STDGs) often rely on simplistic, easily detectable perturbations (e.g., random edge additions/deletions) and fail to strategically target the most influential nodes and edges for maximum impact. We introduce the High Impact Attack (HIA), a novel restricted black-box attack framework specifically designed to overcome these limitations and expose critical vulnerabilities in TGNNs. HIA leverages a data-driven surrogate model to identify structurally important nodes (central to network connectivity) and dynamically important nodes (critical for the graph's temporal evolution). It then employs a hybrid perturbation strategy, combining strategic edge injection (to create misleading connections) and targeted edge deletion (to disrupt essential pathways), maximizing TGNN performance degradation. Importantly, HIA minimizes the number of perturbations to enhance stealth, making it more challenging to detect. Comprehensive experiments on five real-world datasets and four representative TGNN architectures (TGN, JODIE, DySAT, and TGAT) demonstrate that HIA significantly reduces TGNN accuracy on the link prediction task, achieving up to a 35.55% decrease in Mean Reciprocal Rank (MRR) - a substantial improvement over state-of-the-art baselines. These results highlight fundamental vulnerabilities in current STDG models and underscore the urgent need for robust defenses that account for both structural and temporal dynamics. Code and Data are available at https://github.com/ryandhjeon/hia.
Identifying Critical Segments Affecting Piano Performance Evaluation
Effective evaluation of expressive music performance should not only assess technical and interpretive quality but also support efficient practice by identifying musically critical segments. However, traditional expert feedback is often limited by accessibility and scalability. To address this, we investigate how musically critical segments can be automatically identified and interpreted to support personalized assessment of piano performance. We propose a framework that identifies expressive deviations in piano performance by comparing extracted feature values with reference guidelines derived from music sheet analysis and expert annotations. Critical segments are detected via SHAP-based feature importance and change-point analysis, while interpretable annotation is generated using a large language model, conditioned on feature descriptions and quantified deviations. We evaluate our framework on the PercePiano dataset and newly collected annotations, showing consistent improvements in predicted overall scores of the performance after applying the generated annotation. Annotations generated using a finetuned feature extraction model improved predicted scores by up to 8.76%, with greater alignment with expert labels in both segment coverage and overlap. SHAP-based analysis confirms that the model identifies musically important features, enhancing both interpretability of annotation and its relevance to musical evaluation. Our results demonstrate that the proposed framework produces interpretable, musically meaningful annotations aligned with expert evaluations, and can serve as a foundation for scalable, AI-assisted music education and assessment. Our code is available at: https://github.com/Hyerim-Jeon/critical_segments.
ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting
Traffic forecasting represents a crucial problem within intelligent transportation systems. In recent research, Large Language Models (LLMs) have emerged as a promising method, but their intrinsic design, tailored primarily for sequential token processing, introduces notable challenges in effectively capturing spatial dependencies. Specifically, the inherent limitations of LLMs in modeling spatial relationships and their architectural incompatibility with graph-structured spatial data remain largely unaddressed. To overcome these limitations, we introduce ST-LINK, a novel framework that enhances the capability of Large Language Models to capture spatio-temporal dependencies. Its key components are Spatially-Enhanced Attention (SE-Attention) and the Memory Retrieval Feed-Forward Network (MRFFN). SE-Attention extends rotary position embeddings to integrate spatial correlations as direct rotational transformations within the attention mechanism. This approach maximizes spatial learning while preserving the LLM's inherent sequential processing structure. Meanwhile, MRFFN dynamically retrieves and utilizes key historical patterns to capture complex temporal dependencies and improve the stability of long-term forecasting. Comprehensive experiments on benchmark datasets demonstrate that ST-LINK surpasses conventional deep learning and LLM approaches, and effectively captures both regular traffic patterns and abrupt changes.
Parameter-Efficient Transfer Learning for EEG Foundation Models via Task-Relevant Feature Focusing
Electroencephalogram (EEG)-based brain-computer interfaces face challenges in data insufficiency to train neural networks for tasks and generalizability for multiple subjects. To address the issue of data scarcity in EEG research, transfer learning using EEG foundation models (EFMs) has recently gained attention for its ability to leverage prior knowledge. Although transfer learning with EFMs enables tasks to be performed with limited training data, their increasing size presents significant computational challenges. Parameter-efficient transfer learning (PETL) methods address this computational issue by tuning only a small subset of parameters from the pre-trained model. However, existing PETL methods mostly fail to account for the high-dimensional nature of EEG data, which limits their ability to fully leverage the prior knowledge of the EFM when applied to downstream tasks. To address these challenges, we propose a novel PETL method with a TASk-relevanT fEature Focusing modULe (TASTEFUL) to transfer EFMs efficiently. TASTEFUL is designed to focus on task-relevant features and efficiently learn representations tailored for downstream tasks. We evaluated our proposed TASTEFUL on tasks using publicly available EEG datasets, demonstrating its superior performance. Finally, our work highlights TASTEFUL's potential to enhance the practical application of EFMs, marking a significant advancement in PETL for EFMs.
Entity-Aware Generative Retrieval for Personalized Contexts
Given a user query containing ambiguous and user-specific references, how can we effectively retrieve personalized information? Personalized information retrieval (PIR) requires resolving context-dependent cues --- such as nicknames, personal locations, or temporal expressions. This poses challenges for conventional retrievers, including dense and generative models, which often struggle with entity ambiguity and generalization to user-specific contexts. In this paper, we propose PEARL (Personalized Entity-Aware Generative RetrievaL), a novel generative retrieval framework for personalized IR. PEARL addresses key challenges through three components: (i) entity-aware annotation with span-level regularization to reduce lexical sensitivity, (ii) prefix-based contrastive learning to capture structural alignment between lexically divergent query-passage pairs, and (iii) context diversification to improve robustness against user-specific variations. Empirical results on both an existing PIR dataset and our new large-scale synthetic benchmark PAIR show that PEARL consistently outperforms strong baselines under zero-shot evaluation. Notably, PEARL achieves the state-of-the-art performance in Hits@1 and MRR@10, demonstrating its effectiveness for retrieval in personalized user contexts. Our dataset is available at https://www.github.com/pearl-pair/pearl.
Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection
The rapid advancement of generative AI has enabled the mass production of photorealistic synthetic images, blurring the boundary between authentic and fabricated visual content. This challenge is particularly evident in deepfake scenarios involving facial manipulation, but also extends to broader AI-generated content (AIGC) cases involving fully synthesized scenes. As such content becomes increasingly difficult to distinguish from reality, the integrity of visual media is under threat. To address this issue, we propose a physically interpretable deepfake detection framework and demonstrate that defocus blur can serve as an effective forensic signal. Defocus blur is a depth-dependent optical phenomenon that naturally occurs in camera-captured images due to lens focus and scene geometry. In contrast, synthetic images often lack realistic depth-of-field (DoF) characteristics. To capture these discrepancies, we construct a defocus blur map and use it as a discriminative feature for detecting manipulated content. Unlike RGB textures or frequency-domain signals, defocus blur arises universally from optical imaging principles and encodes physical scene structure. This makes it a robust and generalizable forensic cue. Our approach is supported by three in-depth feature analyses, and experimental results confirm that defocus blur provides a reliable and interpretable cue for identifying synthetic images. We aim for our defocus-based detection pipeline and interpretability tools to contribute meaningfully to ongoing research in media forensics. The implementation is publicly available at: https://github.com/irissun9602/Defocus-Deepfake-Detection
Local Large Language Models for Recommendation
Unlike traditional classification tasks, recommendation is inherently subjective-whether an item should be suggested depends not only on user preferences and item semantics, but also on latent behavioral patterns and contextual cues. While recent LLM-based recommenders excel at modeling semantics and intent through generative reasoning, they often fail to capture collaborative signals and suffer from inefficiencies when applied globally across large interaction spaces. We propose Local Large Language Models for Recommendation(L3Rec), a novel model-agnostic framework that integrates collaborative filtering(CF) with generative LLMs through localized modeling. Our approach first applies a light-weight CF model to derive user and item embeddings, then clusters them into behaviorally coherent subgroups. Each cluster is assigned a dedicated generative LLM-referred to as a local LLM-trained only on its corresponding data subset. This enables fine-grained personalization while improving training efficiency through parallelism. At inference time, predictions from local models are aggregated via a fusion strategy, with a global CF fallback when needed. To the best of our knowledge, this is the first LLM-based recommendation framework to incorporate local collaborative structure. Experiments show that it achieves state-of-the-art performance with significantly better scalability and efficiency.
Frequency-Conditioned Diffusion Models for Time Series Generation
Time series data, widely used in fields such as climate studies, finance, and healthcare, often face scarcity in rare scenarios and privacy concerns, prompting growing interest in time series synthesis. Diffusion models have shown strong potential for generating high-quality data, but challenges remain in capturing long-range dependencies and complex patterns. We propose a novel diffusion model that integrates time-domain information with rich frequency-domain features, accounting for differences in noise decay rates across frequencies. Instead of arbitrary frequency splits used in prior works, we partition components based on spectral density, model them separately within the denoising backbone, and fuse them with time-domain features. This enables effective capture of both global and local patterns, enhancing representation of high- and low-frequency information. Extensive experiments on multiple public datasets show promising performance, and analyses including long-term generation and ablation studies demonstrate the model's ability to learn and represent complex time series distributions.
From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models
Enabling large language models (LLMs) to effectively process and reason with graph-structured data remains a significant challenge despite their remarkable success in natural language tasks. Current approaches either convert graph structures into verbose textual descriptions, consuming substantial computational resources, or employ complex graph neural networks as tokenizers, which introduce significant training overhead. To bridge this gap, we present NT-LLM, a novel framework with an anchor-based positional encoding scheme for graph representation. Our approach strategically selects reference nodes as anchors and encodes each node's position relative to these anchors, capturing essential topological information without the computational burden of existing methods. Notably, we identify and address a fundamental issue: the inherent misalignment between discrete hop-based distances in graphs and continuous distances in embedding spaces. By implementing a rank-preserving objective for positional encoding pretraining, NT-LLM achieves superior performance across diverse graph tasks ranging from basic structural analysis to complex reasoning scenarios. Our comprehensive evaluation demonstrates that this lightweight yet powerful approach effectively enhances LLMs' ability to understand and reason with graph-structured information, offering an efficient solution for graph-based applications of language models.
Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge
Log analysis represents a critical sub-domain within AI applications that facilitates automatic approaches to fault and error management of large-scaled software systems, saving labors of traditional manual methods. While existing solutions using large language models (LLMs) show promise, they are limited by a significant domain gap between natural and log languages (the latter contains rich domain-specific tokens such as status codes, IP addresses, resource pathes), which restricts their effectiveness in real-world applications. However, directly adapting general-purpose LLMs to log analysis using raw logs may degrade their performance due to inconsistent token distribution. In this paper, we present a domain adaptation approach that addresses these limitations by integrating interpretable domain knowledge into open-source LLMs through continual pre-training (CPT), which bridges this domain gap by adapting LLMs on interpretable natural texts with log knowledge (instead of raw logs) to reduce distribution discrepancy. To achieve this, we developed NLPLog, a comprehensive dataset containing over 250,000 question-answer pairs on log-related knowledge. Our resulting model, SuperLog, achieves the best performance across four log analysis tasks, with an average accuracy improvement of 12.01% over the second-best model. Ablation study also suggests advantages of domain adaption using interpretable log knowledge over using raw logs.
SELF: Surrogate-light Feature Selection with Large Language Models in Deep Recommender Systems
Feature selection is crucial in recommender systems for improving model efficiency and predictive performance. Conventional approaches typically employ surrogate models-such as decision trees or neural networks-to estimate feature importance. However, their effectiveness is inherently constrained, as these models may struggle under suboptimal training conditions, including feature collinearity, high-dimensional sparsity, and insufficient data. In this paper, we propose SELF, a SurrogatE-Light Feature selection method for deep recommender systems. SELF integrates semantic reasoning from Large Language Models (LLMs) with task-specific learning from surrogate models, enabling an automated and lightweight feature selection process. Specifically, LLMs first produce a semantically informed ranking of feature importance, which is subsequently refined by a surrogate model, effectively integrating general world knowledge with task-specific learning. Comprehensive experiments on three public datasets from real-world recommender platforms validate the effectiveness of SELF. To facilitate reproducibility, our code is publicly available.
SUMMA: A Multimodal Large Language Model for Advertisement Summarization
Understanding multimodal video ads is crucial for improving query-ad matching and relevance ranking on short video platforms, enhancing advertising effectiveness and user experience. However, the effective utilization of multimodal information with high commercial value still largely constrained by reliance on highly compressed video embeddings-has long been inadequate. To address this, we propose SUMMA (the abbreviation of SUmmarizing MultiModalAds), a multimodal model that automatically processes video ads into summaries highlighting the content of highest commercial value, thus improving their comprehension and ranking in Douyin search-advertising systems. SUMMA is developed via a two-stage training strategy-multimodal supervised fine-tuning followed by reinforcement learning with a mixed reward mechanism-on domain-specific data containing video frames and ASR/OCR transcripts, generating commercially valuable and explainable summaries. We integrate SUMMA-generated summaries into our production pipeline, directly enhancing the candidate retrieval and relevance ranking stages in real search-advertising systems. Both offline and online experiments show substantial improvements over baselines, with online results indicating a statistically significant 1.5% increase in advertising revenue. Our work establishes a novel paradigm for condensing multimodal information into representative texts, effectively aligning visual ad content with user query intent in retrieval and recommendation scenarios.
Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering
Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning. Synthetic data generation is a promising solution, while usually facing the challenge of noise labels. To address this challenge, we first introduce a chart synthesis pipeline that generates aligned chart-question-answer triplets through code generation and execution, ensuring the reliability of synthetic data without human intervention. Furthermore, inspired by test-time scaling that increases inference budget and thereby improves performance, we design a candidate-conditioned answering process. The VLM first generates multiple responses per query, and then synthesizes the final answer by contextualizing these candidates. Experiments demonstrate significant improvements, with up to 15.50 points accuracy gain over the initial VLM, in a fully self-improving paradigm without either human-labeled data or external models.
Community-Aware Social Community Recommendation
Social recommendation, which seeks to leverage social ties among users to alleviate the sparsity issue of user-item interactions, has emerged as a popular technique for elevating personalized services in recommender systems. Despite being effective, existing social recommendation models are mainly devised for recommending regular items such as blogs, images, and products, and largely fail for community recommendations due to overlooking the unique characteristics of communities. Distinctly, communities are constituted by individuals, who present high dynamicity and relate to rich structural patterns in social networks. To our knowledge, limited research has been devoted to comprehensively exploiting this information for recommending communities. To bridge this gap, this paper presents CASO, a novel and effective model specially designed for social community recommendation. Under the hood, CASO harnesses three carefully-crafted encoders for user embedding, wherein two of them extract community-related global and local structures from the social network via social modularity maximization and social closeness aggregation, while the third one captures user preferences using collaborative filtering with observed user-community affiliations. To further eliminate feature redundancy therein, we introduce a mutual exclusion between social and collaborative signals. Finally, CASO includes a community detection loss in the model optimization, thereby producing community-aware embeddings for communities. Our extensive experiments evaluating CASO against nine strong baselines on six real-world social networks demonstrate its consistent and remarkable superiority over the state of the art in terms of community recommendation performance.
Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification
Hierarchical Multi-Label Classification (HMC) faces critical challenges in maintaining structural consistency and balancing loss weighting in Multi-Task Learning (MTL). In order to address these issues, we propose a classifier called HCAL based on MTL integrated with prototype contrastive learning and adaptive task-weighting mechanisms. The most significant advantage of our classifier is semantic consistency including both prototype with explicitly modeling label and feature aggregation from child classes to parent classes. The other important advantage is an adaptive loss-weighting mechanism that dynamically allocates optimization resources by monitoring task-specific convergence rates. It effectively resolves the ''one-strong-many-weak'' optimization bias inherent in traditional MTL approaches. To further enhance robustness, a prototype perturbation mechanism is formulated by injecting controlled noise into prototype to expand decision boundaries. Additionally, we formalize a quantitative metric called Hierarchical Violation Rate (HVR) as to evaluate hierarchical consistency and generalization. Extensive experiments across three datasets demonstrate both the higher classification accuracy and reduced hierarchical violation rate of the proposed classifier over baseline models.
Exploring the Tradeoff Between Diversity and Discrimination for Continuous Category Discovery
Continuous category discovery (CCD) aims to automatically discover novel categories in continuously arriving unlabeled data. This is a challenging problem considering that there is no number of categories and labels in the newly arrived data, while also needing to mitigate catastrophic forgetting. Most CCD methods cannot handle the contradiction between novel class discovery and classification well. They are also prone to accumulate errors in the process of gradually discovering novel classes. Moreover, most of them use knowledge distillation and data replay to prevent forgetting, occupying more storage space. To address these limitations, we propose Independence-based Diversity and Orthogonality-based Discrimination (IDOD). IDOD mainly includes independent enrichment of diversity module, joint discovery of novelty module, and continuous increment by orthogonality module. In independent enrichment, the backbone is trained separately using contrastive loss to avoid it focusing only on features for classification. Joint discovery transforms multi-stage novel class discovery into single-stage, reducing error accumulation impact. Continuous increment by orthogonality module generates mutually orthogonal prototypes for classification and prevents forgetting with lower space overhead via representative representation replay. Experimental results show that on challenging fine-grained datasets, our method outperforms the state-of-the-art methods.
Balance and Brighten: A Twin-Propeller Network to Release Potential of Physics Laws for Traffic State Estimation
Traditional physics-informed deep learning combines the data-driven methods with the model-based methods by incorporating physics loss as a constraint in total loss function, which aims to enforce the neural network to behave according to the physics laws. However, the potential of physical knowledge is severely underestimated by this approach. Firstly, the physical knowledge fails to demonstrate its intended effects since the physics loss could have extremely small magnitude, more fluctuating convergence rates, and conflicting directions of the gradients compared to the data loss. Secondly, existing methods implicitly employ physics laws as auxiliary terms, which ignores that explicitly utilizing certain properties of physics laws can compensate for the shortcomings of data-driven models, particularly with regard to the data noise and relationships between variables. To alleviate these issues, we propose a Twin-Propeller Network (TPN) to realize fully message exchange among physical knowledge and data information, that releases the potential of the physics laws. Practically, we independently train data-driven model and physics-based model as two student models to get the information separated. Considering the measurement noise present in the data-driven model and the relatively robust physics-based model, we quantify the data uncertainty and utilize it as a weight to balance the two students in a integrated robust teacher model. The stronger teacher in turn transfers the respective knowledge to another student, where we innovatively propose traffic state relation distillation and physical knowledge distillation to guide the training of the data student and the physics student respectively. Through extensive experiments on both synthetic and real-world datasets, our model demonstrates better performance than the existing state-of-the-art methods.
Dangerous Language Habits! Exploiting Code-Mixing for Backdoor Attacks on NLP Models
Backdoor attacks threaten the reliability of NLP models by embedding hidden behaviors during training, which are activated by specific inputs at inference time. Traditional backdoor triggers often rely on explicit content alterations-such as token insertion or stylistic modification-which may compromise semantic coherence and be easily detected.In this work, we propose a novel backdoor attack strategy that leverages the linguistic properties of code-mixing(a language form that combines elements from two or more languages) as implicit triggers. Drawing inspiration from natural code-mixing communication, we design three types of linguistically grounded triggers: inter-word mixing , intra-sentential mixing , and inter-sentential mixing. These forms reflect realistic language usage patterns in bilingual communities, enhancing the stealthiness of the attack. The experiment results show that existing NLP models perform poorly when faced with backdoor attacks based on code-mixing triggers. We are the first to focus on code-mixing as a trigger for text backdoor attacks. We hope this research raises awareness of the vulnerability of models during training when faced with code-mixing.
Point-DMAE: Point Cloud Self-supervised Learning via Density-directed Masked Autoencoders
Masked autoencoders have been extensively utilized in 3D point cloud self-supervised learning, where the fundamental approach involves masking a portion of the point cloud and subsequently reconstructing it. This process is hypothesized to enhance model learning by leveraging the inherent structure of the point cloud data. However, the information density within point clouds is inherently uneven, contrasting with the more uniform distributions found in language and 2D image data. This uneven distribution suggests that the application of random masking strategies, commonly adopted from NLP and 2D vision, may not be optimal for point cloud data, potentially leading to suboptimal learning outcomes. Based on this observation, we propose a simple yet effective Density-directed Masked Autoencoders for Point Cloud Self-supervised Learning (Point-DMAE), which learns latent semantic point cloud features using a density-directed masking strategy. Specifically, our method employs a dual-branch Transformer architecture to extract both high-level and fine-grained point features through global and local block density-directed masking, respectively. Point-DMAE demonstrates high pre-training efficiency and significantly outperforms our baseline (Point-MAE) on 3D object classification tasks within the ScanObjectNN dataset by 4.13% on OBJ-BG, 5.17% on OBJ-ONLY, and 4.17% on PB-T50-RS. Codes are available at https://github.com/jinxianglong10/Point-DMAE.
Enhancing Information Diffusion Prediction via Multiple Granularity Hypergraphs and Position-aware Sequence Model
With the rise of social media, accurately predicting information diffusion has become crucial for a wide range of applications. Existing methods usually employ sequential hypergraphs to model users' latent interaction preferences and use self-attention mechanisms to capture dependencies with users. However, they typically focus on a single temporal scale and lack the ability to effectively model temporal influence, which limits their performance in diffusion prediction tasks. To address these limitations, we propose a novel method (MHPS) to enhance information diffusion prediction via multiple granularity hypergraphs and a position-aware sequence model. Specifically, MHPS constructs hypergraph sequences of different granularities by grouping user interactions according to various time intervals. Additionally, to further enhance the modeling of temporal influence, two types of cross-attention mechanisms, namely next-step positional cross-attention and source influence cross-attention, are introduced within the cascade representation. The next-step positional cross-attention captures target position awareness, while the source influence cross-attention focuses on the impact of the initial source. Then, gating mechanisms and GRUs are employed to fuse the different attention outputs and predict the next target user. Extensive experiments on real-world datasets demonstrate that MHPS achieves competitive performance against state-of-the-art methods. The average improvements are up to 7.82% in terms of Hits@10 and 5.60% in terms of MAP@100. Our code is available at https://github.com/cgao-comp/MHPS.
Rethinking the Training Paradigm of Discrete Token-Based Multimodal LLMs: An Analysis of Text-Centric Bias
Discrete token-based multimodal large language models (MLLMs), such as AnyGPT and MIO, integrate diverse modalities into an autoregressive framework by discretizing modality inputs into tokens compatible with language models. Unlike encoder-based approaches, such as LLaVA and Flamingo, which utilize pretrained modality-specific encoders, discrete token-based MLLMs simultaneously learn modality token representations and their alignment with the language, yet are exclusively trained on modality-text paired datasets without additional unimodal training. We identify a structural limitation inherent in this training paradigm, termed text-centric bias, defined as an over-reliance on the textual context that restricts intrinsic modality understanding. To systematically analyze the existence of this bias, we propose an analytical framework involving external perplexity-based and internal neuron-level analyses. Furthermore, to verify whether the bias originates from the paired-only training paradigm, we introduce an analytical methodology named Monotune, which is a simple unimodal training stage. Our analyses demonstrate that minimal exposure to unimodal data effectively mitigates text-centric bias, providing empirical evidence that the bias is fundamentally induced by the paired-only training strategy. Through comprehensive downstream task evaluations, we further reveal that this structural bias meaningfully affects real-world multimodal task performance, particularly under limited textual contexts. Our findings highlight a fundamental limitation in current discrete token-based MLLM training paradigms and suggest directions for future multimodal training strategies. Our code and experiments are available at https://github.com/41312432/Monotune
Generalizing Query Performance Prediction under Retriever and Concept Shifts via Data-driven Correction
Query Performance Prediction (QPP) aims to estimate the effectiveness of an information retrieval (IR) system without access to ground-truth relevance judgments. Existing supervised QPP methods typically follow a regression model framework that maps query-document representations to target metrics such as RR@10 or nDCG@10. However, these approaches often suffer from degraded performance under concept shift, where the distribution of relevance given a query-document pair changes between training and test datasets. This paper proposes a novel classification-based framework, QPP-MLC (QPP Multi-Label Classification), which formulates QPP as a multi-label classification task. QPP-MLC infers the relevance of each document among the top-k retrieved results and aggregates these document-level relevance predictions to predict the overall query performance. As a result, QPP-MLC provides a diagnosis tool for the concept shift and a correction method under the concept shift by modulating a threshold level of classification tasks. Experiments on MS MARCO and TREC DL benchmarks show that QPP-MLC achieves strong prediction accuracy and outperforms traditional regression-based QPP methods.
Efficiency Boost in Decentralized Optimization: Reimagining Neighborhood Aggregation with Minimal Overhead
In today's data-sensitive landscape, distributed learning emerges as a vital tool, not only fortifying privacy measures but also streamlining computational operations. This becomes especially crucial within fully decentralized infrastructures where local processing is imperative due to the absence of centralized aggregation. Here, we introduce DYNAWEIGHT, a novel framework to information aggregation in multi-agent networks. DYNAWEIGHT offers substantial acceleration in decentralized learning with minimal additional communication and memory overhead. Unlike traditional static weight assignments, such as Metropolis weights, DYNAWEIGHT dynamically allocates weights to neighboring servers based on their relative losses on local datasets. Consequently, it favors servers possessing diverse information, particularly in scenarios of substantial data heterogeneity. Our experiments on various datasets MNIST, CIFAR10, and CIFAR100 incorporating various server counts and graph topologies, demonstrate notable enhancements in training speeds. Notably, DYNAWEIGHT functions as an aggregation scheme compatible with any underlying server-level optimization algorithm, underscoring its versatility and potential for widespread integration.
Causal Effect Variational Transformer for Public Health Measures and COVID-19 Infection Cluster Analysis
Recent research increasingly integrates causal inference into deep learning models to enhance the explainability and robustness of medical applications. However, data scarcity remains a fundamental challenge due to privacy constraints and the high cost of data collection. This issue, compounded by complex variable dependencies and unobserved latent confounders, hinders the reliable estimation of causal effects. To address these challenges, we collect two real-world COVID-19 infection cluster datasets, including public health measures, from distinct distributions in collaboration with local governments, a medical university, and a hospital. We also propose a cut-off augmentation method that generates diverse feature-label pairs by slicing time-series sequences at different observation windows, effectively simulating partial observations common in real-world settings. We further introduce the Causal Effect Variational Transformer (CEVT), a Transformer-based model that captures temporal structure and addresses the difficulty of causal estimation under scarce data, complex dependencies, and latent confounding by modeling multiple treatments through an iterative conditioning mechanism. We validate the causal modeling capability of CEVT on synthetic datasets and demonstrate that, on two distinct COVID-19 datasets, it consistently outperforms baselines in infection prediction. Notably, the causal effects estimated by CEVT converge with findings from medical studies on infection control, reinforcing its reliability and underscoring its potential to inform public health decision-making.
Curriculum Guided Personalized Subgraph Federated Learning
Subgraph Federated Learning (FL) aims to train Graph Neural Networks (GNNs) across distributed private subgraphs, but it suffers from severe data heterogeneity. To mitigate data heterogeneity, weighted model aggregation methods personalize each local GNN by assigning larger weights to parameters from clients with similar subgraph characteristics inferred from their current model states. However, the sparse and biased subgraphs often trigger rapid overfitting, causing the estimated client similarity matrix to stagnate or even collapse. As a result, the aggregation loses effectiveness as clients reinforce their own biases instead of exploiting diverse knowledge otherwise available. To this end, we propose a novel personalized subgraph FL framework called Curriculum Guided Personalized SUbgraph Federated Learning (CUFL). On the client side, CUFL adopts Curriculum Learning (CL) that adaptively selects edges for training according to their reconstruction scores, exposing each GNN first to easier, generic cross-client substructures and only later to harder, client-specific ones. This paced exposure prevents early overfitting to biased patterns and enables gradual personalization. By regulating personalization, the curriculum also reshapes server aggregation from exchanging generic knowledge to propagating client-specific knowledge. Further, CUFL improves weighted model aggregation by estimating client similarity using fine-grained structural indicators reconstructed on a random reference graph. Extensive experiments on six benchmark datasets confirm that CUFL achieves superior performance compared to relevant baselines. Code is available at https://github.com/Kang-Min-Ku/CUFL.git.
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
Materials synthesis remains a critical bottleneck in developing innovations for energy storage, catalysis, electronics, and biomedical devices. Current synthesis design relies heavily on empirical trial-and-error methods guided by expert intuition, limiting the pace of materials discovery. To address this challenge, we present AlchemyBench, a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis recipes from open-access literature. AlchemyBench provides an end-to-end framework that supports research in large language models (LLMs) applied to materials synthesis prediction. The benchmark encompasses four key tasks: raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. To enable scalable evaluation, we propose an LLM-as-a-Judge framework that leverages large language models for automated assessment, demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78). Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o) achieve scores around 4.0 on well-documented oxide and organic synthesis targets, but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines trained on 1M samples, while retrieval-augmented generation provides an additional +0.20 improvement when supplied with five high-similarity contexts. AlchemyBench addresses a critical gap in the field by providing the first comprehensive, legally redistributable benchmark for automated materials synthesis prediction. Our contributions establish a foundation for exploring LLM capabilities in predicting and guiding materials synthesis, ultimately accelerating experimental design and innovation in materials science.
Self-supervised Dual-view Framework with Tailored Negative Sampling for New Activity Detection
With the growing adoption of IoT, analyzing multi-modal sensor time series for human activity recognition has become crucial for intelligent and context-aware applications. While existing approaches assume a fixed set of known activities, real-world deployments encounter new activities not present in the training data. Detecting them is challenging due to overlapping patterns between known and new activities, high intra-class variability, and sensor heterogeneity across datasets. To address these challenges, we propose CLAN, a novel self-supervised dual-view framework for new activity detection. It employs a two-tower architecture that extracts discriminative representations from both time and frequency domains. By treating multiple strongly augmented views of known activity samples as negatives, the model learns invariant representations that effectively distinguish new activities. In addition, a dataset-aware augmentation selection mechanism adaptively determines transformations tailored to each dataset's characteristics, thereby enhancing generalization across diverse sensor environments. Extensive experiments on five real-world human activity datasets demonstrate that CLAN consistently outperforms new activity detection baselines, achieving up to 9.24% improvement in AUROC.
Exploring Diverse Sparse Network Structures via Dynamic Pruning with Weight Alignment
Deep neural networks (DNNs) often require a large number of parameters, which has led to the development of model pruning techniques that remove weight connections. In this paper, we propose a new method to maximize the effect of finding sparse patterns through a gradient scaling technique that modifies the weight distribution. This approach allows for the exploration of more diverse sparse patterns compared to traditional dynamic pruning methods, leading to the discovery of stable subnetworks. Through various experiments, we demonstrate the importance of exploration in finding better sparse patterns. We achieve state-of-the-art performance across multiple network architectures on datasets such as CIFAR-10/100 and ImageNet. The code is available at https://github.com/Acasia/DWA.
Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning
Recent advancements in heterogeneous GNNs have enabled significant progress in embedding nodes and learning relationships across diverse tasks. However, traditional methods rely heavily on meta-paths grounded in node types, which often fail to encapsulate the full complexity of node interactions, leading to inconsistent performance and elevated computational demands. To address these challenges, we introduce MF2Vec, a novel framework that shifts focus from rigid node-type dependencies to dynamically exploring shared facets across nodes, regardless of type. MF2Vec constructs multi-faceted paths and forms homogeneous networks to learn node embeddings more effectively. Through extensive experiments, we demonstrate that MF2Vec achieves superior performance in node classification, link prediction, and node clustering tasks, surpassing existing baselines. Furthermore, it exhibits reduced performance variability due to meta-path dependencies and achieves faster training convergence. These results highlight its capability to analyze complex networks comprehensively. The implementation of MF2Vec is publicly available at https://github.com/kimjongwoo-cell/MF2Vec.
From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets
Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage framework that integrates unsupervised pattern extracion with interpretable forecasting. (i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns that are invariant to amplitude scaling and temporal distortion, even under varying window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part of extracted patterns as input and forecasts subsequent partial sequences for short-term directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate that our method ranks first or second in 11 out of 12 metric--dataset combinations, consistently outperforming baselines. Unlike conventional deep learning models that output buy-or-sell signals without interpretable justification, our approach enables transparent decision-making by revealing the underlying pattern structures that drive predictive outcomes.
A Self-Supervised Mixture-of-Experts Framework for Multi-behavior Recommendation
In e-commerce, where users face a vast array of possible item choices, recommender systems are vital for helping them discover suitable items they might otherwise overlook. While many recommender systems primarily rely on a user's purchase history, recent multi-behavior recommender systems incorporate various auxiliary user behaviors, such as item clicks and cart additions, to enhance recommendations. Despite their overall performance gains, their effectiveness varies considerably between visited items (i.e., those a user has interacted with through auxiliary behaviors) and unvisited items (i.e., those with which the user has had no such interactions). Specifically, our analysis reveals that (1) existing multi-behavior recommender systems exhibit a significant gap in recommendation quality between the two item types (visited and unvisited items) and (2) achieving strong performance on both types with a single model architecture remains hallenging. To tackle these issues, we propose a novel multi-behavior recommender system, MEMBER. It employs a mixture-of-experts framework, with experts designed to recommend the two item types, respectively. Each expert is trained using a self-supervised method specialized for its design goal. In our comprehensive experiments, we show the effectiveness of MEMBER across both item types, achieving up to 65.46% performance gain over the best competitor in terms of Hit Ratio@20.
From Menus to the Interactive Food-Ordering Systems
Conversational interfaces have emerged as an accessible and user-friendly alternative to traditional touch-based self-service kiosks in food-ordering systems. Despite their promise, building such systems remains challenging due to the need for costly data annotation, store-specific model adaptation, and scalable deployment. In this study, we propose a fully automated, end-to-end framework that transforms structured menu databases into high-quality annotated datasets and efficiently deploys store-specific conversational models using a parameter-efficient fine-tuning method. Our approach fine-tunes only 0.9% of the backbone model parameters per store, enabling cost-effective and plug-and-play deployment across diverse environments. To enhance robustness, we further integrate a recommendation module that suggests alternative items when requested menu options are unavailable. Experimental results on data from 27 stores in South Korea demonstrate that our framework consistently outperforms existing data generation baselines in intent classification and slot filling performance, while maintaining high annotation quality. Simulated real-world voice-ordering scenarios confirm the practicality of our framework for rapid, scalable, and accessible deployment in real-world environments.
OASIS: Open-world Adaptive Self-supervised and Imbalanced-aware System
The expansion of machine learning into dynamic environments presents challenges in handling open-world problems where label shift, covariate shift, and unknown classes emerge concurrently. Post-training methods have been explored to address these challenges, adapting models to newly emerging data. However, these methods struggle when the initial pre-training is performed on class-imbalanced datasets, limiting generalization to minority classes. To address this, we propose OASIS, an Open-world Adaptive Self-supervised and Imbalanced-aware System. OASIS consists of two learning phases: pre-training and post-training. The pre-training phase aims to improve the classification performance of samples near class boundaries via a novel borderline sample refinement step. Notably, the borderline sample refinement step critically improves the robustness of the decision boundary in the representation space. Through this robustness of the pre-trained model, OASIS generates reliable pseudo-labels, adapting the model against open-world problems in the post-training phase. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art post-training techniques in both accuracy and efficiency across diverse open-world scenarios.
Online Activation Value-aware Clustering and Aggregation for Faithful Argumentative Explanations
Argumentative explainable artificial intelligence employs argumentation theory to explain the mechanisms of machine learning. Previous approaches for explaining deep learning models collectively compressed layers via clustering. However, this resulted in accumulated information loss across layers, thereby degrading the fidelity of explanations. We propose online activation value-aware clustering and aggregation, a compression algorithm that preserves the inference structure of the original neural network with greater fidelity. The proposed method sequentially compresses each layer, immediately recalculates activation values following compression, and rectifies inter-layer information loss using a singular-value-scaled ridge alignment approach. To evaluate the effectiveness of the proposed method, we introduce four novel quantitative metrics. Input-output fidelity and structural fidelity measure how accurately the compressed model preserves the original model predictions and internal activations. Input-output perturbation consistency and structural perturbation consistency assess the similarity of the changes induced by Gaussian-perturbed input data. Experiments on three benchmark datasets (Breast Cancer, California Housing, and HIGGS) demonstrate that our method achieves performance improvements ranging from 12.9% to 53.7% across the four metrics, demonstrating significantly higher explanation fidelity than existing approaches.
Where Does Legal AI Fail? Evaluating RAG Pipelines
Retrieval Augmented Generation (RAG) frameworks have been integrated into LLMs for QA tasks, which have enabled lower levels of hallucination during the generation stage. Despite ample research in evaluating and detecting hallucination, research regarding its source in the legal domain is still minimal. In this paper, we break down each component of the RAG framework and analyze how they affect the quality of the generation. We use the expert answers of 1,039 real Korean civil complaints provided by the Korean Ministry of Economy and Finance to analyze the quality of generation across 12 experimental configurations that vary in retrieval, reranking, and generation components. To evaluate generation quality, we employ LLM-as-Judge comparing model outputs against expert responses across three dimensions: conclusion consistency, correctness and incorrectness. We validate our findings through logistic regression analysis, establishing statistical relationships between component metrics (Recall@200, NDCG@20) and generation consistency. Despite differing performance across models, two patterns emerge from the generation evaluation: increased retrieval performance correlates with decreased generation quality, and increased reranker performance correlates with higher generation quality. Our work questions the conventional wisdom that ''more context is better'' and provides a systematic attribution framework for finding the origin of hallucination in legal RAG systems.
Constructing Set-Compositional and Negated Representations for First-Stage Ranking
Set compositional and negated queries are crucial for expressing complex information needs and enable the discovery of niche items like ''Books about non-European monarchs''. Despite the recent advances in LLMs, first-stage ranking remains challenging due to the requirement of encoding documents and queries independently from each other. This limitation calls for constructing compositional query representations that encapsulate logical operations or negations, and can be used to match relevant documents effectively. In the first part of this work, we explore constructing such representations in a zero-shot setting using vector operations between lexically grounded Learned Sparse Retrieval (LSR) representations. Specifically, we introduce Disentangled Negation that penalizes only the negated parts of a query, and a Combined Pseudo-Term approach that enhances LSR's ability to handle intersections. We find that our zero-shot approach is competitive and often outperforms retrievers fine-tuned on compositional data, highlighting certain limitations of LSR and Dense Retrievers. Finally, we address some of these limitations and improve LSR's representation power for negation, by allowing them to attribute negative term scores and effectively penalize documents containing the negated terms.
FairRegBoost: An End-to-End Data Processing Framework for Fair and Scalable Regression
Fairness-aware machine learning has gained significant attention due to the growing demand for ethical decision-support systems. This paper introduces FairRegBoost, a novel fairness-aware regression framework that takes a holistic data management perspective by integrating automated data preparation, uncertainty modeling, and post-processing adjustments using optimal transport techniques into effective and efficient solutions. Our approach effectively balances predictive accuracy and fairness by minimizing the output distribution distance between protected groups, leveraging uncertainty and sample similarities guiding the transport. We conduct extensive experiments on real-world datasets with both single and multiple protected attributes. Results demonstrate that FairRegBoost consistently achieves superior fairness-accuracy trade-offs compared to state-of-the-art approaches. Moreover, our scalability analysis highlights the computational efficiency, making it a practical choice for large-scale applications.
D-HAT: Dynamic Hypergraph Representation Learning with Attention-Based Multi-Level Hypergraph Sampling
Hypergraph Neural Networks (HNNs) leverage higher-order interactions in graph-structured data to enable effective representation learning across a wide range of applications. Due to the issue of incorporating irrelevant relations in full hypergraphs, it is crucial to adopt a hypergraph sampling method that efficiently captures substructures while preserving representational quality. However, existing hypergraph sampling methods that target only nodes or hyperedges suffer from subgraph disconnection issues and neglect of node importance due to the randomness in sampling and use of static computational sub-hypergraphs. In this paper, we propose D-HAT, a hypergraph learning framework that dynamically constructs representative sub-hypergraphs through a novel attention-based multi-level hypergraph sampling strategy during the training of HNNs. To prioritize informative neighbors and enhance the representational quality of sub-hypergraphs during training, we develop a new attention-based HNNs incorporating attention-guided aggregation and dense skip connections. To the best of our knowledge, this paper is the first to quantitatively compare various hypergraph sampling methods for hypergraph representation learning. Experiments on real-world graph datasets demonstrate the effectiveness of D-HAT, which consistently achieves higher accuracy compared to existing hypergraph sampling methods.
Advanced Privacy Protection in Federated Learning using Server-initiated Homomorphic Encryption
Federated learning (FL) has been widely adopted to provide machine learning (ML) privacy, protecting sensitive user data from leakage. However, there are still attacks that could exploit FL to access users' sensitive data, such as model inversion attacks, property inference attacks, and membership inference attacks. Various solutions were proposed to secure FL using various privacy-preserving techniques, such as differential privacy, homomorphic encryption, and multi-party encryption. However, existing solutions often add noise to the model that hinders the accuracy, or introduce large computational overhead that makes them impractical to use. In this paper, we propose a new privacy protection scheme for FL that uses homomorphic encryption (HE), noise, and secret sharing to protect users' sensitive data from up to n-2 adversarial clients and the server colluding. The computational overhead is minimised by transferring expensive computations of HE to the server, requiring only the encryption and homomorphic addition to be carried out by clients. We provide proof sketches to validate the security of our scheme, and experimental results to demonstrate the practicality of our proposed scheme. The results show that our scheme adds only up to 8% overhead without losing any accuracy to base FL models, showing minimal overhead without losing accuracy, regardless of the data used.
Amortized Baseline Selection via Rank-Revealing QR for Efficient Model Explanation
Model-agnostic explanation methods are essential for interpreting machine learning models, but suffer from prohibitive computational costs that scale with the number of baselines. Existing acceleration approaches either lack a theoretical base or provide no principled guidance for baseline selection. To address this gap, we present ABSQR (Amortized Baseline Selection via Rank-Revealing QR). This framework exploits the low-rank structure of value matrices to accelerate multi-baseline attribution methods. Our approach combines deterministic baseline selection via SVD-guided QR decomposition with an amortized inference mechanism that utilizes cluster-based retrieval. We reduce computational complexity from O (m • 2d) to O (k • 2d), where k ≪ m. Experiments demonstrate that ABSQR achieves a 91.2% agreement rate with full baseline methods while providing 8.5× speedup across diverse datasets. As the first acceleration approach that preserves explanation error guarantees under computational speedup, ABSQR makes the practical deployment of interpretable AI systems feasible at scale.
Mitigating Distribution Shift in Stock Price Data via Return-Volatility Normalization for Accurate Prediction
How can we address distribution shifts in stock price data to improve stock price prediction accuracy? Stock price prediction has attracted attention from both academia and industry, driven by its potential to uncover complex market patterns and enhance decisionmaking. However, existing methods often fail to handle distribution shifts effectively, focusing on scaling or representation adaptation without fully addressing distributional discrepancies and shape misalignments between training and test data. We propose ReVol (Return-Volatility Normalization for Mitigating Distribution Shift in Stock Price Data), a robust method for stock price prediction that explicitly addresses the distribution shift problem. ReVol leverages three key strategies to mitigate these shifts: (1) normalizing price features to remove sample-specific characteristics, including return, volatility, and price scale, (2) employing an attention-based module to estimate these characteristics accurately, thereby reducing the influence of market anomalies, and (3) reintegrating the sample characteristics into the predictive process, restoring the traits lost during normalization. Additionally, ReVol combines geometric Brownian motion for long-term trend modeling with neural networks for short-term pattern recognition, unifying their complementary strengths. Extensive experiments on real-world datasets demonstrate that ReVol enhances the performance of the state-of-the-art backbone models in most cases, achieving an average improvement of more than 0.03 in IC and over 0.7 in SR across various settings.
Context-aware Sequential Bundle Recommendation via User-specific Representations
How can we recommend bundles that reflect users' changing preferences over time? Sequential bundle recommendation aims to recommend bundles of items while capturing users' evolving preferences over time. Unlike traditional bundle or sequential recommendation, this task requires modeling both the structural composition of bundles and the temporal dynamics of user behavior. We identify three major challenges: (1) dynamic user preferences across bundle interactions, (2) user-dependent attention to different items in the same bundle, and (3) users' diverse preferences on bundling strategies. To address these, we propose CoReSBR (Contextualized Representation for Sequential Bundle Recommendation), an adaptive framework that constructs bundle representations contextualized by time-aware user preferences. CoReSBR encodes recent user interactions to reflect preference shifts, assigns attention-based weights on items in bundles using user embedding as query, and integrates multiple bundling strategies through user-specific combination. Extensive experiments on real-world datasets demonstrate that CoReSBR outperforms the state-of-the-art methods, achieving up to 8.91% higher nDCG and 8.05% higher recall.
Learning Graph Edit Distance via Node Matching Patterns
Graph edit distance (GED) is a general and versatile measure of graph similarity. Many combinatorial algorithms have been proposed for computing exact GED, but they suffer from the high computational cost due to the NP-hardness of GED computation. To address this challenge, approximate GED computation techniques have been extensively studied. These techniques are generally twofold: early work is based on combinatorial algorithms that restrict search space for efficient computation, while more recent approaches employ machine learning techniques to estimate GED. Although learning-based approaches generally achieve higher estimation accuracy than combinatorial approximations, they often rely on smoothed node embeddings to model node-to-node interactions, which may limit their ability to capture fine-grained structural differences. To alleviate this limitation, we exploit the insight that sequential variations in node interactions across GNN layers exhibit informative patterns. In this paper, we design a novel neural model, Grasp, that learns to extract and leverage these patterns to predict pairwise node matching probabilities and their associated costs. By aggregating these estimates from the perspectives of both input graphs, Grasp effectively and accurately computes an approximate GED. Experimental results on real-world datasets demonstrate that Grasp significantly improves estimation accuracy over existing approximate methods.
SwaGNER: Leveraging Span-aware Grid Transformers for Accurate Nested Named Entity Recognition
How can we accurately recognize overlapping entity spans in text while effectively capturing global context among spans? Nested Named Entity Recognition (nested NER) becomes challenging in the presence of nested or overlapping entity spans. Traditional span-based methods enumerate all possible spans, resulting in high computational costs and severe label imbalance from excessive negative spans which are non-entities. Additionally, they often fail to fully capture global context among overlapping entities. In this paper, we propose SwaGNER, a novel approach that dynamically selects span candidates via boundary detection, and encodes their interactions using a contextual grid. By filtering out low-confidence spans early, SwaGNER focuses on high-quality candidates, arranging them in a grid to explicitly model global context and span relationships. This integrated boundary detection and span classification approach reduces error propagation and effectively leverages sentence-wide context. Experiments demonstrate that SwaGNER achieves the state-of-the-art performance on both nested NER and flat NER.
Pedagogy-R1: Pedagogical Large Reasoning Model and Well-balanced Educational Benchmark
Recent advances in large reasoning models (LRMs) have demonstrated impressive capabilities in highly structured domains such as mathematics and programming. However, their application to education-where effective reasoning must be pedagogically meaningful, context-sensitive, and responsive to real student needs-remains relatively unexplored. Existing large language models (LLMs) often struggle to deliver instructional coherence, formative feedback, or simulate sophisticated teacher decision-making, limiting their practical utility in educational settings. To fill this gap, we present Pedagogy-R1, a comprehensive pedagogical reasoning framework designed to adapt LLMs for authentic classroom tasks. Our approach features three key innovations: (1) a distillation-based training pipeline that uses pedagogically filtered outputs for instruction tuning, (2) the Well-balanced Educational Benchmark (WBEB), which systematically evaluates models across five dimensions-subject knowledge, pedagogical knowledge, knowledge tracing, essay scoring, and real-world teacher decision-making-and (3) the Chain-of-Pedagogy (CoP) prompting strategy, employed both to generate pedagogically enriched training data and to elicit teacher-like reasoning during inference. We conduct a mixed-methods evaluation, combining fine-grained quantitative analyses of model performance with qualitative insights into the model's pedagogical reasoning patterns.
BOVIS: Bias-Mitigated Object-Enhanced Visual Emotion Analysis
Visual emotion analysis is a promising field that aims to predict emotional responses elicited by visual stimuli. While recent advances in deep learning have significantly improved emotion detection capabilities, existing methods often fall short because of their exclusive focus on either holistic visual features or semantic content, thereby neglecting their interplay. To address this limitation, we introduce BOVIS, a Bias-Mitigated Object-Enhanced Visual Emotion Analysis framework. To capture the subtle relationships between visual and semantic features and enrich the understanding of emotional contexts, BOVIS leverages pre-trained models to extract comprehensive image features, integrate object-level semantics, and enhance contextual information. Moreover, BOVIS incorporates a bias mitigation strategy that involves an adjusted Mean Absolute Error loss function alongside an Inverse Probability Weighting method to address dataset imbalances and enhance fairness in emotion prediction. Comprehensive evaluations across various benchmark datasets demonstrate the effectiveness of the BOVIS framework in enhancing visual emotion analysis. The results reveal that the synergy between object-specific features and holistic visual representations improves the accuracy and interpretability of emotion analysis, while optimizing bias mitigation enhances fairness and increases reliability. The code is available at https://github.com/leeyubin10/BOVIS.git.
What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems
Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a significant challenge. This paper conducts a feasibility study on minimizing implicit feedback inference data for such systems. We propose a novel problem formulation, analyze various minimization techniques, and investigate key factors influencing their effectiveness. We demonstrate that substantial inference data reduction is technically feasible without significant performance loss. However, its practicality is critically determined by two factors: the technical setting (e.g., performance targets, choice of model) and user characteristics (e.g., history size, preference complexity). Thus, while we establish its technical feasibility, we conclude that data minimization remains practically challenging and its dependence on the technical and user context makes a universal standard for data 'necessity' difficult to implement.
Fast Outlier Detection in Oblique Subspaces
Subspace outlier detection is a fundamental data mining task for high-dimensional data with diverse applications across widely varying domains. Existing subspace hashing methods have been capable of identifying accurate and interpretable outliers in axis-parallel subspaces in linear time. However, these methods simply fail when applied to arbitrary-shaped, schema-less data that lack well-defined attributes or dimensions, such as time series and graphs. In this paper, we introduce a new notion of oblique subspaces defined on pairwise object proximity functions without requiring explicit multidimensional representations of the underlying data. By hashing data objects into arbitrarily oriented oblique subspaces, we can construct subspace hashing histograms for efficient and cost-effective outlier detection. Our proposed solution, OS-Hash (Oblique Subspace Hashing), is a linear-time, constant-space method applicable to not only multidimensional but also arbitrary-shaped data, and it can further be extended to the data stream setting for subspace outlier detection. Our experimental studies on real-world multidimensional, time series, and graph data demonstrate the efficiency and efficacy of OS-Hash, which outperforms state-of-the-art outlier detection methods in terms of both runtime performance and accuracy.
Bridging Queries and Tables through Entities in Open-Domain Table Retrieval
Open-domain table retrieval plays a vital role in accessing information from structured formats on the web, yet it remains less explored than text retrieval. Table cells primarily consist of phrases and words, which include numerous entities, such as times, locations, persons, and organizations. While emphasizing entities in text retrieval has been extensively studied, there is a significant lack of research on their applications in table retrieval. In this work, we explore how to leverage entities in tables to improve retrieval performance. We investigate the important role of entities in table retrieval from a statistical perspective and propose an Entity-Centric Alignment framework for Table retrieval (ECAT). Specifically, we use entity types to highlight entities appearing in queries and tables. Then, we propose an entity-driven late interaction paradigm based on entity representations for dense and sparse retrievers, respectively. Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retrievers. Empirical results on table retrieval benchmarks, NQ-TABLES and OTT-QA, show that our proposed ECAT is effective in enhancing existing retrievers. Extensive analyses confirm the efficacy of ECAT's different components. Our code and dataset are available at https://github.com/Trustworthy-Information-Access/ECAT.
Publicly Verifiable and Fault-Tolerant Privacy-Preserving Aggregation for Federated Learning
Publicly verifiable privacy-preserving aggregation is widely regarded as an effective approach to protect user privacy and ensure the integrity of the aggregated model published by the aggregator in Federated Learning (FL). State-of-the-art solutions either fail to guarantee unforgeability when the aggregator colludes with malicious users or require costly cryptographic operations during the online aggregation phase and lack fault tolerance. In this work, we propose eVTPA, the first online-efficient, publicly verifiable, and fault-tolerant privacy-preserving aggregation protocol considering malicious users and aggregators for FL. We introduce a novel collusion-resistant symmetric masking technique to conceal users' local gradients while ensuring the correctness of the aggregated model through a publicly verifiable aggregation signature algorithm. To improve the efficiency of online signature generation, we design a specialized precomputation-based acceleration method and leverage the randomness of masking to enable batch processing. Furthermore, eVTPA adopts a dynamic mask update mechanism that tolerates user dropouts without affecting the validation of the aggregated model. Security analysis shows that eVTPA meets FL's confidentiality, integrity, and authenticity requirements. Experimental results demonstrate that our scheme maintains model classification accuracy while achieving at least a 7.85× faster online aggregation than related solutions at the same security level.
DO: An Efficient Deep Reinforcement Learning Approach for Optimal Route with Collective Spatial Keywords
Given a source, destination, and required keywords, the Optimal Route with Collective Spatial Keywords ( ORCSK ) query aims to find the shortest route covering all keywords. Existing Point of Interest (POI) candidate set-based and path expansion-based methods frequently produce inferior route quality or excessive time overhead, particularly under large-scale query keywords. To address this challenge, we introduce the DO framework, which pioneers the employ Deep Reinforcement Learning for the ORCSK. Specifically, DO first integrates the spatial index with the H2H index to generate and refine high-quality candidate sets. Subsequently, DO utilizes a Transformer-based model to determine the optimal route from the sets. To effectively combine spatial distance and POI attributes, we propose a novel dual-cross encoder architecture. Furthermore, leveraging this architecture, we introduce a multi-route generating strategy, exploiting parallel computing to enhance route quality. Our experiments on real-life road networks demonstrate superior route quality and response time compared to the state-of-the-art method, with an average improvement of 1-2 orders of magnitude in response time, and maintain high efficiency even under large-scale query keywords or dynamic POI attributes scenarios.
Where Do LLMs Go Wrong? Diagnosing Automated Peer Review via Aspect-Guided Multi-Level Perturbation
Large Language Models (LLMs) are increasingly integrated into academic peer review, prompting debates between full automation and purely human evaluation. Emerging evidence suggests optimal peer review leverages both human expertise and AI capabilities, and several major conferences have already adopted AI-assisted reviewing practices. However, effectively integrating these reviewers requires an aspect-based understanding of LLM vulnerabilities, clearly identifying specific dimensions where AI is most prone to error. Prior studies broadly caution against LLM biases but lack precise, aspect-specific insights necessary for informed human-AI partnerships in peer-review processes. We propose an aspect-guided, multi-level perturbation framework to systematically diagnose LLM weaknesses in automated peer review. By introducing targeted perturbations across key review components (papers, reviews, rebuttals) and evaluating impacts along critical quality dimensions (contribution, soundness, presentation, tone, completeness), our framework functions as a diagnostic tool: deviations from expected rating shifts after perturbation directly reveal specific LLM vulnerabilities. Our empirical analyses uncover recurring weaknesses, including misclassification of methodological flaws, disproportionate influence of strong rejection recommendations, inadequate responses to incomplete or negatively toned rebuttals, and misinterpretation of incorrect critiques as rigorous evaluations. These vulnerabilities consistently persist across diverse prompting strategies and a broad set of widely-used LLMs (e.g., GPT-4o, Gemini 2.0, LLaMA 3). This diagnostic framework provides granular insights into LLM limitations, empowering conference organizers to establish pragmatic, aspect-specific guidelines and enabling balanced, informed, and robust peer-review practices.
Tilia: Enhancing LIME with Decision Tree Surrogates
Local Interpretable Model-Agnostic Explanations (LIME) is a widely adopted framework for interpreting opaque models due to its simplicity and intuitiveness. However, LIME suffers from unreliability rooted in two core issues: (i) low fidelity, where the surrogate model fails to accurately approximate the target model's behavior, and (ii) instability, where the generated explanations vary significantly across runs. While prior work has proposed techniques to enhance LIME, they remain fundamentally limited by the expressiveness of linear surrogate models, which cannot adequately capture complex decision boundaries. In this work, we introduce Tilia, a novel method that employs shallow decision tree regressors as the surrogate model, leveraging its structured and deterministic nature to improve both fidelity and stability. Tilia also provides insight into the interplay between surrogate models and sampling strategies, revealing new directions for enhancing explanation reliability. Across extensive experiments on tabular and textual datasets, Tilia outperforms LIME and recent variants on both fidelity and stability, achieving up to 100% approximation of the opaque model and entirely consistent explanations (i.e., 0 Jacard distance). Tilia maintains practical efficiency, completing explanations in seconds even for datasets with over 100 features. These results position Tilia as a robust alternative for model-agnostic explanations. The code is available at https://github.com/neur1n/tilia.
A Node-Aware Dynamic Quantization Approach for Graph Collaborative Filtering
In the realm of collaborative filtering recommendation systems, Graph Neural Networks (GNNs) have demonstrated remarkable performance but face significant challenges in deployment on resource-constrained edge devices due to their high embedding parameter requirements and computational costs. Using common quantization method directly on node embeddings may overlooks their graph based structure, causing error accumulation during message passing and degrading the quality of quantized embeddings.To address this, we propose Graph based Node-Aware Dynamic Quantization training for collaborative filtering (GNAQ), a novel quantization approach that leverages graph structural information to enhance the balance between efficiency and accuracy of GNNs for Top-K recommendation. GNAQ introduces a node-aware dynamic quantization strategy that adapts quantization scales to individual node embeddings by incorporating graph interaction relationships. Specifically, it initializes quantization intervals based on node-wise feature distributions and dynamically refines them through message passing in GNN layers. This approach mitigates information loss caused by fixed quantization scales and captures hierarchical semantic features in user-item interaction graphs. Additionally, GNAQ employs graph relation-aware gradient estimation to replace traditional straight-through estimators, ensuring more accurate gradient propagation during training. Extensive experiments on four real-world datasets demonstrate that GNAQ outperforms state-of-the-art quantization methods, including BiGeaR and N2UQ, by achieving average improvement in 27.8% Recall@10 and 17.6% NDCG@10 under 2-bit quantization. In particular, GNAQ is capable of maintaining the performance of full-precision models while reducing their model sizes by 8 to 12 times; in addition, the training time is twice as fast compared to quantization baseline methods.
KUG: Joint Enhancement of Internal and External Knowledge for Retrieval-Augmented Generation
Query enhancement, a pivotal methodology in Retrieval-Augmented Generation (RAG) for addressing information scarcity in queries, has garnered increasing research attention. Nevertheless, existing approaches overlook the inherent distinctions between domain-specific knowledge and external factual sources during integration. To bridge this gap, we propose KUG (Knowledge-Update-Generation), a novel RAG framework that leverages internal knowledge semantics to ensure query enhancement efficacy, validates and dynamically updates knowledge representations using external evidence, and achieves systematic integration through knowledge graph embeddings. Extensive experiments on six standard BEIR benchmarks demonstrate that KUG outperforms the state-of-the-art methods, achieving an improvement of 1%-2% in recall metrics. Notably, the framework demonstrates significant performance gains in multi-hop reasoning tasks, advancing the development paradigm for RAG systems. The code will be public soon.
Content-Agnostic Moderation for Stance-Neutral Recommendations
Personalized recommendation systems often drive users towards more extreme content, exacerbating opinion polarization. While content-aware moderation has been proposed to mitigate these effects, such approaches risk curtailing the freedom of speech and information. To address this concern, we propose and explore the feasibility of content-agnostic moderation as an alternative approach for reducing polarization. Content-agnostic moderation does not rely on the actual content being moderated, arguably making it less prone to forms of censorship. We establish theoretically that content-agnostic moderation cannot be guaranteed to work in a fully generic setting. However, we show that it can often be effectively achieved in practice with plausible assumptions. We introduce two novel content-agnostic moderation methods that modify recommendations from the content recommender to disperse user-item co-clusters without relying on content features. To evaluate the potential of content-agnostic moderation in controlled experiments, we built a simulation environment to analyze the closed-loop behavior of a system with a given set of users, a recommendation system, and a moderation approach. Through comprehensive experiments in this environment, we show that our proposed moderation methods significantly enhance stance neutrality and maintain high recommendation quality across various data scenarios. Our results indicate that achieving stance neutrality without direct content information is not only feasible but can also help develop more balanced and informative recommendation systems without substantially degrading user engagement.
TKHist: Cardinality Estimation for Join Queries via Histograms with Dominant Attribute Correlation Finding
Cardinality estimation has long been crucial for cost-based database optimizers in identifying optimal query execution plans, attracting significant attention over the past decades. While recent advancements have significantly improved the accuracy of multi-table join query estimations, these methods introduce challenges such as higher space overhead, increased latency, and greater complexity, especially when integrated with the binary join framework. In this paper, we introduce a novel cardinality estimation method named TKHist, which addresses these challenges by relaxing the uniformity assumption in histograms. TKHist captures bin-wise non-uniformity information, enabling accurate cardinality estimation for join queries without filter predicates. Furthermore, we explore the attribute independent assumption, which can lead to significant over-estimation rather than under-estimation in multi-table join queries. To address this issue, we propose the dominating join path correlation discovery algorithm to highlight and manage correlations between join keys and filter predicates. Our extensive experiments on popular benchmarks demonstrate that TKHist reduces error variance by 2-3 orders of magnitude compared to SOTA methods, while maintaining comparable or lower memory usage.
Contextual Representation Anchor Network for Mitigating Selection Bias in Few-Shot Drug Discovery
In the drug discovery process, the low success rate of drug candidate screening often leads to insufficient labeled data, causing the few-shot learning problem in molecular property prediction. Existing methods for few-shot molecular property prediction overlook the sample selection bias, which arises from non-random sample selection in chemical experiments. This bias in data representativeness leads to suboptimal performance. To overcome this challenge, we present a novel method named Contextual Representation Anchor Network (CRANet), where an anchor refers to a cluster center of the representations of molecules and serves as a bridge to transfer enriched contextual knowledge into molecular representations and enhance their expressiveness. CRANet introduces a dual-augmentation mechanism that includes context augmentation, which dynamically retrieves analogous unlabeled molecules and captures their task-specific contextual knowledge to enhance the anchors, and anchor augmentation, which leverages the anchors to augment the molecular representations. We evaluate our approach using the MoleculeNet and FS-Mol benchmarks, as well as through domain transfer experiments. The outcomes indicate that CRANet surpasses current state-of-the-art methods by 0.10% to 5.48% in AUC and 2.52% in ΔAUC-PR metrics, showcasing its exceptional generalization abilities.
Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights
Text-based collaborative filtering (TCF) has emerged as the prominent technique for text and news recommendation, employing language models (LMs) as text encoders to represent items. However, the current landscape of TCF models mainly relies on the utilization of relatively small or medium-sized LMs. The potential impact of using larger, more powerful language models (such as these with over 100 billion parameters) as item encoders on recommendation performance remains uncertain. Can we anticipate unprecedented results and discover new insights? To address this question, we undertake a comprehensive series of experiments aimed at exploring the performance limits of the TCF paradigm. Specifically, we progressively augment the scale of item encoders, ranging from one hundred million to one hundred billion parameters, in order to reveal the scaling limits of the TCF paradigm. Moreover, we investigate whether these exceptionally large LMs have the potential to establish a universal item representation for the recommendation task, thereby revolutionizing the traditional ID paradigm, which is considered a significant obstacle to developing transferable ''one model fits all'' recommender models. Our study not only demonstrates positive results but also uncovers unexpected negative outcomes, illuminating the current state of the TCF paradigm within the community. These findings will evoke deep reflection and inspire further research on text-based recommender systems.
Linking Ordered and Orderless Modeling for Sequential Recommendation
Sequential recommendation is pivotal to personalized services by modeling the temporal dynamics of user behavior. However, existing methods often rely on abundant interactions, making it unreliable under sparse user interactions. Recent attempts to integrate sequential signals with orderless structural cues (e.g., global co-occurrence) help alleviate this issue but typically adopt tight fusion, which can dilute order-aware signals. To address this, we propose LOOM (Loosely-Coupled Ordered-Orderless Modeling), a structure-agnostic guidance module for sequential recommenders. LOOM is sequence-first: The sequential backbone acts as a teacher, guiding orderless carriers via one-way KL divergence, with recency-aware weighting and confidence-modulated strength to filter stale or uncertain relations. This preserves temporal modeling while selectively incorporating complementary orderless knowledge. Experiments on four public datasets and various sequential architectures show that LOOM outperforms state-of-the-art methods. Code is available at https://github.com/cqu-jia/LOOM.
Give Me Some SALT: Structure-Aware Link Modeling for Temporal Weighted Link Prediction
In dynamic graph analysis, research has predominantly focused on temporal link prediction (TLP) for unweighted links, with growing interest in predicting temporal link weights in recent years. Temporal weighted link prediction (TWLP) aims to estimate both the existence and the link weights, which is naturally formulated as a regression task. The long-tail distribution and short-term randomness of link weights pose significant challenges for TWLP. In this paper, we introduce SALT, a Structure-Aware Link modeling for Temporal weighted link prediction, which consists of Weighted Link Encoder (WLE) and Temporal Link State Space Module (TLSSM). WLE encodes each snapshot into link-centric embeddings with common neighbor information, and addresses the long-tail issue by leveraging weights to adjust the embedding distribution. Additionally, TLSSM is designed to handle short-term randomness in temporal modeling. On eight datasets, our model achieves average reductions of 19.86% in RMSE and 24.61% in MAE compared to state-of-the-art baselines.
QGCMA: A Framework for Knowledge-Based Visual Question Answering
Visual Question Answering (VQA) systems encounter formidable challenges when tackling complex queries that demand external knowledge integration and multi-modal reasoning. Current methodologies often grapple with the effective alignment of visual and textual features, as well as the utilization of structured knowledge bases, which limits their performance in handling intricate semantic and inferential tasks. To address these critical issues, this paper presents a framework based on three key innovations. Firstly, the Question-Guided Attention (QGA) mechanism adaptively steers the model's focus towards visual regions and knowledge entities that are semantically congruent with the query. By doing so, it ensures that contextually relevant information is prioritized during the feature extraction process, enhancing the model's ability to capture pertinent visual and knowledge cues. Secondly, the Cross-Modal Alignment (CMA) module employs a contrastive learning strategy to enforce precise alignment across visual, textual, and knowledge modalities. This approach effectively mitigates the detrimental effects of spurious correlations by enhancing semantic consistency among heterogeneous data sources, thereby improving the overall quality of multi-modal feature integration. Thirdly, the Dynamic Knowledge Integration (DKI) component empowers the model to dynamically select and fuse knowledge information from external graph structures. This functionality significantly augments the model's reasoning capacity, enabling it to handle questions that necessitate compositional inference over structured knowledge. Comprehensive experimental evaluations conducted on the OK-VQA and VQA v2 benchmarks demonstrate the superiority of our proposed method over existing state-of-the-art methods.
Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark
Multi-Scenario Recommendation (MSR) tasks, referring to building a unified model to enhance performance across all recommendation scenarios, have recently gained considerable attention. However, current research in MSR faces two significant challenges that hinder the field's development: the absence of uniform procedures for multi-scenario dataset processing, thus hindering fair comparisons, and most models being closed-source, which complicates comparisons with current SOTA models. Consequently, we introduce our benchmark, Scenario-Wise Rec, which comprises six public datasets and twelve baseline models, along with a training and evaluation pipeline. We further validate Scenario-Wise Rec on an industrial advertising dataset, underscoring its robustness. We hope the benchmark will give researchers clear insights into prior work, enabling them to develop novel models and thereby fostering a collaborative research ecosystem in MSR. Our source code is publicly available (https://github.com/Applied-Machine-Learning-Lab/Scenario-Wise-Rec).
Structural Entropy-based Multivariate Time Series Forecasting
Multivariate time series (MTS) forecasting is crucial for predicting the future states of complexly coupled variables based on historical observations. To effectively capture the intricate interdependencies within MTS, graph-based methods have emerged as powerful tools. However, existing graph construction methods often produce structures that fail to preserve key temporal and cross-variable dependencies, introducing redundant or irrelevant connections. To address these challenges, we propose a structural entropy-based approach for MTS forecasting. The approach optimizes graph structures by reducing structural redundancy, thereby improving forecasting accuracy. Initially, we represent the temporal dependences by constructing an encoding tree incrementally. Through hierarchical organization of time steps, the temporal evolution is adaptively captured. Subsequently, we give the community-aware representation by building an encoding tree over the variables in MTS, extracting homogeneous communities from the tree structure while integrating community influence to better capture inter-variable dependencies. Finally,we present a training algorithm designed to generate accurate predictions for MTS, accompanied by a unified loss function that integrates forecasting inaccuracies with variations in structural entropy. Empirical findings on real-world datasets substantiate that our approach outperforms state-of-the-art models in capturing dependencies and enhancing forecasting precision.
OFedED: One-shot Federated Learning with Model Ensemble and Dataset Distillation
One-shot federated learning (FL) has gained traction due to its communication efficiency and scalability. However, unlike traditional FL, which can frequently align client models through multiple rounds of client training and server aggregation, one-shot FL allows only a single communication round, causing each client to easily overfit its local data and leading to divergent objectives. Without any chance to iteratively correct these biases or mitigate heterogeneity, the aggregated model significantly deviates from the optimum achieved under dataset centralized training. To address this challenge, we propose OFedED, a one-shot FL framework that preserves privacy and fully exploits client data by combining local data distillation with server-side ensemble learning. Each client distills its own dataset into an ultracompact coreset that retains essential distributional characteristics; the server aggregates these coresets to guide ensemble training that captures inter-client heterogeneity, harnesses complementary knowledge, corrects local bias, and drives performance close to centralized training. In addition, we theoretically show that, under mild assumptions for local data distillations, the server can simulate a centralized optimization process by finetuning on the aggregated distilled data, effectively bypassing the need for multiple communication rounds, showing that properly distilled data can encode sufficient task-relevant information to support centralized-level optimization. Extensive experiments reveal that OFedED consistently and significantly outperforms SOTA methods, achieving an improvement of up to 9.17% on MNIST and 3.97% on CIFAR-10, the robustness being verified also by experiments using ResNet and various server-client architectures.
Ensemble Pruning via Graph Neural Networks
Ensemble learning is a pivotal machine learning strategy that combines multiple base learners to achieve prediction accuracy surpassing that of any individual model. Despite its effectiveness, large-scale ensemble learning consumes a considerable amount of resources. Ensemble pruning addresses this issue by selecting a subset of base learners from the original ensemble to form a sub-ensemble, while maintaining or even improving the performance of the original model. However, existing ensemble pruning strategies often rely on heuristic solutions that may fail to capture complex interactions among base learners. To address this limitation, in this work, we model the base learners in an ensemble as a weighted and attributed graph, where node features represent characteristics of each learner and edge weights represent relationships between the base learners. Leveraging this representation, we propose a novel ensemble pruning method based on graph neural networks (GNNs). Our approach incorporates specialized GNN architectures designed for bagging and boosting ensembles. Experimental results demonstrate that our method not only improves prediction accuracy but also significantly reduces inference time across diverse datasets. Our implementation is available at the anonymous repository: https://github.com/TechnologyAiGroup/GRE.
STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach
Spatio-temporal tasks often encounter incomplete data arising from missing or inaccessible sensors, making spatio-temporal kriging crucial for inferring the completely missing temporal information. However, current models struggle with ensuring the validity and generalizability of inferred spatio-temporal patterns, especially in capturing dynamic spatial dependencies and temporal shifts, and optimizing the generalizability of unknown sensors. To overcome these limitations, we propose Spatio-Temporal Aware Graph Adversarial Neural Network (STA-GANN), a novel GNN-based kriging framework that improves spatio-temporal pattern validity and generalization. STA-GANN integrates (i) Decoupled Phase Module that senses and adjusts for timestamp shifts. (ii) Dynamic Data-Driven Metadata Graph Modeling to update spatial relationships using temporal data and metadata; (iii) An adversarial transfer learning strategy to ensure generalizability. Extensive validation across nine datasets from four fields and theoretical evidence both demonstrate the superior performance of STA-GANN.
BordaRAG: Resolving Knowledge Conflict in Retrieval-Augmented Generation via Borda Voting Process
Recently, research found that the documents retrieved from the Retrieval-Augmented Generation (RAG) may contain conflicting knowledge with each other, leading Large Language Models (LLMs) to generate incorrect responses. To solve such a problem, existing approaches usually only keep the most frequently mentioned knowledge from these documents, since they assume that the most representative knowledge aligns best with the true answer. Although effective in certain scenarios, these approaches often underperform when the most frequent knowledge is not the correct one. From the voting perspective, these methods can be regarded as a Majority Voting (MV) process, which chooses the most frequent candidates among different candidate knowledge. However, we show that the underperformance of such methods stems from that MV is only effective with a small number of candidates and binary voting scores. In contrast, in the RAG scenario, the candidates (knowledge) are very diverse, and the voting scores (document relevance scores) are typically continuous. Simply adapting MV in RAG will result in poor performance of LLMs. In voting theory, on the other hand, the preference-based voting methods represented by the Borda Voting (BV) consider the whole preference order of voters over all candidates, enabling the selection of candidates that better represent the collective viewpoint. Inspired by such an insight, we propose BordaRAG, a model designed to better select the most appropriate documents from conflicting documents. Specifically, BordaRAG first computes the preference scores of the documents over the candidate answers. After that, a BV component is designed to select the winning documents according to the preference scores. Finally, the chosen documents are provided to LLMs, which will generate the final response. Experimental results on three open-domain QA datasets show that BordaRAG can outperform all baselines.
A Cost-Aware Approach for Collaborating Large Language Models and Small Language Models
The emerging reasoning ability of large language models (LLMs) and accompanying commercial applications offer a promising path for service providers to deploy intelligent agents on their own products through API calls. However, the black-box nature of LLMs has driven providers to try prompt tuning to improve reasoning quality for competitiveness, while the generated reasoning logic results in additional service costs. Although some works have proposed collaborating LLMs and Small Language Models (SLMs) to reduce the frequency of LLM calls, most overlook the actual number of tokens interacting with the LLMs, which results in a potentially high cost still. Furthermore, directly compressing the prompt to reduce tokens often leads to a significant accuracy loss. To address the above challenges, we propose a cost-aware approach for collaborating LLMs and SLMs, named Coco. In our method, a confidence-based task assignment method is designed which leverages the result confidence of SLMs to assess task complexity and determine whether LLM involvement is necessary. For complex tasks, the SLM adapts the input by compressing unnecessary information according to confidence. Considering the potential loss of accuracy, prompt tuning-based reasoning optimization methods are introduced to guide the LLM in generating both the reasoning logic sketch and the final result. Finally, logic alignment is applied to fuse sketches from both models, ensuring the rationality of the reasoning logic. Experimental results on three open-source datasets demonstrate that our approach effectively reduces the cost of API calls to LLMs while ensuring the reasoning accuracy and the reasonableness of generated logic.
Differentiable Probabilistic Logic Reasoning For Knowledge Graph Completion
Towards Knowledge Graph (KG) completion, probabilistic logic reasoning approaches enable effective rule mining but incur high computational cost, while embedding-based methods offer high efficiency but confront limited semantic understanding. Neuro-symbolic approaches combine both by employing embeddings to approximate probabilistic distributions, yet face challenges in weight optimization and hurdles in scaling up from large probability graphs. To address these issues, we propose DPLogic, a differentiable probabilistic logic reasoning framework for KG completion. Initially, we construct a Markov logic network by selecting crucial formulas and constraining groundings to relevant subgraphs, effectively boosting the scalability of the framework. Subsequently, we represent formula weights through relation-specific embeddings by introducing neural logical operators, creating a differentiable pathway for end-to-end optimization. Finally, we obtain the distribution of unobserved KG triplets by facilitating the joint optimization of embedding-based and probabilistic distributions through an EM algorithm. Empirical findings on standardized datasets illustrate that our proposed DPLogic consistently surpasses state-of-the-art methodologies in terms of both efficacy and efficiency.
Calibrating on Kolmogorov-Arnold Network
Kolmogorov-Arnold Networks (KANs) are neural architectures inspired by the Kolmogorov-Arnold representation theorem that leverage B-spline parameterizations for flexible, locally adaptive function approximation. Although KANs can capture complex nonlinearities beyond those modeled by standard Multi-Layer Perceptrons (MLPs), they frequently exhibit miscalibrated confidence estimates-manifesting as overconfidence in dense data regions and underconfidence in sparse areas. In this work, we systematically examine the impact of four critical hyperparameters -- Layer Width, Grid Order, Shortcut Function, and Grid Range -- on the calibration of KANs. Furthermore, we introduce a novel Temperature-Scaled Loss (TSL) that integrates a temperature parameter directly into the training objective, dynamically adjusting the predictive distribution during learning. Both theoretical analysis and extensive empirical evaluations on standard benchmarks demonstrate that TSL significantly reduces calibration errors, thereby improving the reliability of probabilistic predictions. Overall, our study provides actionable insights into the design of spline-based neural networks and establishes TSL as a robust, loss-agnostic solution for enhancing calibration.
Calibrating on Medical Segmentation Model through Signed Distance
Classical overlap metrics such as Dice or IoU quantify where a medical-image segmentation falls short but say nothing about the confidence of each prediction. Over-confident errors are particularly dangerous in clinical practice, where a single false-positive voxel may trigger an unnecessary biopsy. We introduce three contributions that jointly address spatial precision and reliability. (i) Signed-Distance Calibration (SDC) loss couples cross-entropy, local calibration and a differentiable signed-distance penalty, enforcing boundary accuracy while moderating confidence. (ii) A Spatially Adaptive Margin (SAM) module applies lightweight morphological transforms to ground-truth masks before computing the local target, sharpening ambiguous edges. (iii) Pixel-wise Expected Calibration Error (pECE) extends ECE to millions of voxels and penalises high-confidence false positives. Across four public datasets (ACDC, FLARE, BraTS, PROSTATE) and two back-bones (U-Net, nnU-Net), SDC improves Dice by up to 4 percentage points and halves ECE compared with the state of the art, without sacrificing runtime. Code is available on: https://github.com/EagleAdelaide/SDC-Loss.
UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion
The growth of e-commerce has created substantial demand for multimodal search systems that process diverse visual and textual inputs. Current e-commerce multimodal retrieval systems face two key limitations: they optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches. To address these challenges, we introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations. Our work makes three key contributions. First, we propose a flexible architecture with a novel gated multimodal encoder that uses adaptive fusion mechanisms. This encoder integrates different modality representations while handling missing modalities. Second, we develop a comprehensive training strategy to optimize learning. It combines cross-modal alignment loss (CMAL), cohesive local alignment loss (CLAL), intra-modal contrastive loss (IMCL), and adaptive loss weighting. Third, we create M-BEER, a carefully curated multimodal benchmark containing 50K product pairs for e-commerce search evaluation. Extensive experiments demonstrate that UniECS consistently outperforms existing methods across four e-commerce benchmarks with fine-tuning or zero-shot evaluation. On our M-BEER bench, UniECS achieves substantial improvements in cross-modal tasks (up to 28% gain in R@10 for text-to-image retrieval) while maintaining parameter efficiency (0.2B parameters) compared to larger models like GME-Qwen2VL (2B) and MM-Embed (8B). Furthermore, we deploy UniECS in the e-commerce search platform of Kuaishou Inc. across two search scenarios, achieving notable improvements in Click-Through Rate (+2.74%) and Revenue (+8.33%). The comprehensive evaluation demonstrates the effectiveness of our approach in both experimental and real-world settings. Corresponding codes, models and datasets will be made publicly available at https://github.com/qzp2018/UniECS.
Federated Continual Recommendation
The increasing emphasis on privacy in recommendation systems has led to the adoption of Federated Learning (FL) as a privacy-preserving solution, enabling collaborative training without sharing user data. While Federated Recommendation (FedRec) effectively protects privacy, existing methods struggle with non-stationary data streams, failing to maintain consistent recommendation quality over time. On the other hand, Continual Learning Recommendation (CLRec) methods address evolving user preferences but typically assume centralized data access, making them incompatible with FL constraints. To bridge this gap, we introduce Federated Continual Recommendation (FCRec), a novel task that integrates FedRec and CLRec, requiring models to learn from streaming data while preserving privacy. As a solution, we propose F3CRec, a framework designed to balance knowledge retention and adaptation under the strict constraints of FCRec. F3CRec introduces two key components: Adaptive Replay Memory on the client side, which selectively retains past preferences based on user-specific shifts, and Item-wise Temporal Mean on the server side, which integrates new knowledge while preserving prior information. Extensive experiments demonstrate that F3CRec outperforms existing approaches in maintaining recommendation quality over time in a federated environment. Our code is available at https://github.com/Jaehyung-Lim/F3CRec-CIKM-25.
ConGM: Contrastive Graph Matching for Graph Self-Supervised Learning
Graph neural networks (GNNs) are widely used in information retrieval, but they often require large amounts of labeled data. To address this problem, self-supervised methods like graph contrastive learning (GCL) are developed to learn from graph structures without labeled data. However, GCL faces a challenge in practice. Plenty of traditional GCL methods differ fundamentally from GNNs in handling neighboring nodes, hindering effective contrastive learning. To address the above issue, we propose a graph self-supervised learning model based on Contrastive Graph Matching (ConGM). The model effectively mitigates the conflict between GCL methods and the homophily assumption of GNNs by using linear node matching and quadratic edge alignment mechanisms to treat some neighboring nodes as both positive and negative samples, rather than considering all neighboring nodes as negative samples as in traditional GCL methods. Additionally, to tackle the imbalance of positive and negative samples in edge alignment, we design a bi-level negative sample selection strategy to choose appropriate hard negative samples. Extensive experiments conducted on multiple benchmark datasets have validated the effectiveness of our proposed method.
To Know What User Concerns: Conceptual Knowledge Reasoning for User Satisfaction Estimation in E-Commerce Dialogue Systems
With the development of generative models, dialogue systems play an important role in many web applications, such as E-commerce and Question-Answering websites. The accurate user satisfaction estimation (USE) is a critical problem in measuring the quality of dialogue systems. In e-commerce, users usually seek consultation through dialogue systems to know detailed information about the products they intend to purchase. Existing studies mainly focus on analyzing user sentiment in a dialogue for USE, neglecting to understand what the user is concerned about when requesting a consultation. It may cause fatal errors when the response is emotionally friendly but non-informative. Thus, to evaluate how a dialogue satisfies the user's requirements, it is essential to have a conceptual understanding of the products to determine if the response has addressed the user's question. In this paper, we propose a knowledge-enhanced USE model named CoRe-USE, which introduces the Conceptual Knowledge Reasoning for USE in E-Commerce Dialogue Systems. We first design a simple yet efficient entity linking and relation selection module enabling conceptual reasoning in each dialogue. Then, we propose a hierarchical encoder to capture the contextual information in multi-turn dialogues. Finally, we introduce a knowledge enhancement module to fuse conceptual reasoning into contextual embeddings to produce USE. For evaluation, we conduct experiments on three real-world datasets in various scenarios, the results demonstrate the effectiveness and robustness of CoRe-USE compared with SOTA baselines.
SC-DAG: Semantic-Constrained Diffusion Attacks for Stealthy Exposure Manipulation in Visually-Aware Recommender Systems
Visually-aware recommender system (VARS) has become increasingly prevalent in various online services by integrating visual features of items to enhance recommendation quality. However, VARS introduces new security vulnerabilities and malicious attackers can perform visual shilling attacks to manipulate recommendation lists via uploading generated images with visually imperceptible perturbations. While prior research has explored such threats to help service providers enhance their systems, existing visual shilling attack methods still suffer from uncontrolled pixel-space perturbation, energy dispersion dilemma and semantic misalignment in reference selection. In this work, we present Semantic-Constrained Diffusion Adversarial Generation (SC-DAG) for visual shilling attacks. SC-DAG overcomes key limitations of previous methods by focusing perturbations on semantically meaningful image regions through contour-aware segmentation, guiding adversarial generation in latent space using a conditional diffusion process, and performing a hybrid reference image selection strategy that balances popularity and semantic similarity. Extensive experiments on performing visual shilling attacks against multiple VARS models show that SC-DAG achieves state-of-the-art attack performance in elevating target items' ranking, while maintaining strong perceptual indistinguishability and minimal impact on overall recommendation performance of the system. Our work offers insights into leveraging structured semantic priors for more sophisticated adversarial manipulations against VARS and also highlights the necessity for developing more robust VARS models resilient to visual shilling attacks. We provide our implementation at https://github.com/KDEGroup/SC-DAG.
Crocodile: Cross Experts Covariance for Disentangled Learning in Multi-Domain Recommendation
Multi-domain learning (MDL) has become a prominent topic in enhancing the quality of personalized services. It's critical to learn commonalities between domains and preserve the distinct characteristics of each domain. However, this leads to a challenging dilemma in MDL. On the one hand, a model needs to leverage domain-aware modules such as experts or embeddings to preserve each domain's distinctiveness. On the other hand, real-world datasets often exhibit long-tailed distributions across domains, where some domains may lack sufficient samples to effectively train their specific modules. Unfortunately, nearly all existing work falls short of resolving this dilemma. To this end, we propose a novel Cross-experts Covariance Loss for Disentangled Learning model (Crocodile), which employs multiple embedding tables to make the model domain-aware at the embeddings which consist most parameters in the model, and a covariance loss upon these embeddings to disentangle them, enabling the model to capture diverse user interests among domains. Empirical analysis demonstrates that our method successfully addresses both challenges and outperforms all state-of-the-art methods on public datasets. During online A/B testing in Tencent's advertising platform, Crocodile achieves 0.72% CTR lift and 0.73% GMV lift on a primary advertising scenario. The code is openly accessible at: https://github.com/SkylerLinn/Crocodile.
EAPformer: Entropy-Aware Patch Transformer for Multivariate Long-Term Time Series Forecasting
Multivariate long-term time series forecasting is pivotal across numerous domains, yet precise predictions require a differentiated assessment of historical time segments due to their varying influence on future trends. Patch-based Transformer frameworks show promise for capturing local temporal patterns. However, they face limitations with static patching, which disrupts temporal continuity, fails to adapt to shifts between periodic and volatile patterns, and overlooks dynamic interactions between time segments and variables. To address these limitations, we propose Entropy-Aware Patch Transformer (EAPformer) which dynamically segments time series for differentiated assessments of historical patterns. Specifically, we overcome static patching limitations by leveraging temporal entropy to dynamically adjust patch boundaries through a two-stage policy, achieving interpretable and context-sensitive segmentation. Subsequently, we adapt EAPformer to periodic and volatile dynamics by employing entropy-aware segmentation that captures distinct temporal patterns across diverse segments. Finally, we further capture dynamic interactions across time segments and variables by introducing a multi-dimensional dependency learning architecture. Additionally, a gated fusion mechanism integrates local and global patterns, enhancing robustness. Extensive experiments on eight public benchmarks demonstrate that EAPformer outperforms state-of-the-art models, achieving superior accuracy across all metrics.
From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive Learning for Multi-Turn Classification
In conversational AI systems, a critical challenge in training effective multi-turn intent classification models lies in the generation of large-scale, domain-specific, multilingual dialogue datasets. In this paper, we introduce Chain-of-Intent, a novel framework that integrates Hidden Markov Models (HMMs) with Large Language Models (LLMs) to generate intent-driven, context-aware dialogues through self-play. Our method first extracts domain-specific intent transition patterns from real-world e-commerce chat logs, which guide the modeling of turn-level dynamics and intent sequences. LLMs are then employed to parameterize the emission probabilities of HMMs, enabling the generation of natural, coherent utterances aligned with predicted intents and dialogue context. We also propose MINT-CL, a multi-task contrastive learning framework for multi-turn intent classification, which improves performance while reducing dependence on large-scale annotated datasets. Empirical results demonstrate that our approach outperforms competitive baselines in dialogue generation quality and classification accuracy, particularly in multilingual settings. To facilitate future research, we release MINT-E, a comprehensive, multilingual, intent-aware multi-turn dialogue corpus derived from the e-commerce domain.
Improved Personalized Headline Generation via Denoising Fake Interests from Implicit Feedback
Accurate personalized headline generation hinges on precisely capturing user interests from historical behaviors. However, existing methods neglect personalized-irrelevant click noise in entire historical clickstreams, which may lead to hallucinated headlines that deviate from genuine user preferences. In this paper, we reveal the detrimental impact of click noise on personalized generation quality through rigorous analysis in both user and news dimensions. Based on these insights, we propose a novel Personalized Headline Generation framework via Denoising Fake Interests from Implicit Feedback (PHG-DIF). PHG-DIF first employs dual-stage filtering to effectively remove clickstream noise, identified by short dwell times and abnormal click bursts, and then leverages multi-level temporal fusion to dynamically model users' evolving and multi-faceted interests for precise profiling. Moreover, we release DT-PENS, a new benchmark dataset comprising the click behavior of 1,000 carefully curated users and nearly 10,000 annotated personalized headlines with historical dwell time annotations. Extensive experiments demonstrate that PHG-DIF substantially mitigates the adverse effects of click noise and significantly improves headline quality, achieving state-of-the-art (SOTA) results on DT-PENS. Our framework implementation and dataset are available at https://github.com/liukejin-up/PHG-DIF.
AdaPatch: Adaptive Patch-Level Modeling for Non-Stationary Time Series Forecasting
Time series forecasting has witnessed significant advancements through deep learning techniques. However, most existing methods struggle in non-stationary environments, where data distributions evolve over time due to concept drift. To address the challenge of non-stationarity in time series, various stabilization techniques have been proposed to mitigate temporal variations. Nonetheless, these methods operate at the instance level, assuming a homogeneous distribution across all time steps within an instance and relying on fixed statistical normalization. This limits their ability to effectively capture fine-grained distributional shifts. In this paper, we introduce AdaPatch, a novel forecasting model specifically designed to tackle non-stationary multivariate time series. AdaPatch addresses intra-instance distributional shifts by adopting an adaptive scheme for patch-level encoding and normalization, which makes the model capture fine-grained temporal variations more effectively. To further enhance the quality of representations, AdaPatch incorporates a patch reconstruction branch and jointly optimizes a reconstruction loss alongside the forecasting objective. This auxiliary path serves as an implicit regularization mechanism, guiding the encoder to retain meaningful local temporal structures. Furthermore, to enable AdaPatch to better model complex local dynamics, we propose a patch-based predictive decoding strategy that leverages the decoder from the reconstruction branch to replace conventional point-wise forecasting with a more structured patch-level prediction mechanism. Extensive experiments conducted on six real-world multivariate time series datasets demonstrate that AdaPatch achieves superior performance compared to several state-of-the-art baselines, highlighting its effectiveness and strong generalization capability. Our code and data are publicly available at https://github.com/iuaku/AdaPatch.
LLMCBR: Large Language Model-based Multi-View and Multi-Grained Learning for Bundle Recommendation
The exploration of bundle recommendation has garnered significant attention for its potential to enhance user experience and augment business sales. Previous research in this domain has primarily focused on modeling user-item and user-bundle interactions, utilizing multi-view collaboration to bolster the accuracy of bundle recommendations. Nevertheless, existing methodologies exhibit limitations, notably in the inadequate modeling of multi-view information and the absence of multi-grained details. Consequently, addressing the intricate correlation among users, items, and bundles necessitates a sophisticated approach capable of capturing both global and local nuances. We present a novel framework named Large Language Model-based Multi-View and Multi-Grained Learning for Bundle Recommendation (LLMCBR). We introduce an LLM-based semantic refinement module to summarize and encode bundle-level knowledge. To bridge the gap between semantic representation and collaborative signals, we design an adaptation strategy. Furthermore, LLMCBR leverages multi-view and multi-granular modeling to unify collaborative signals. Specifically, LLMCBR integrates item preferences within both bundle-view and item-view, thereby augmenting the comprehensiveness of multi-view data. Following this integration, each view undergoes stratification into multiple granularities to facilitate the acquisition of multi-grained details. We introduce a multiple contrastive instance mechanism to regulate the influence of different granularities and views. This mechanism empowers the model to comprehend complex consumer behaviors across various dimensions. LLMCBR is extensively evaluated over three real-world datasets, and the experimental results demonstrate its superiority.
Masked Graph Distance Network for Accurate Subgraph Similarity Computation
Subgraph similarity search aims to identify target graphs in the database that approximately contain the query graph, which is a fundamental problem in graph analysis. As a key measure for subgraph similarity computation, Subgraph Edit Distance (SED) has garnered significant research attention. Unfortunately, the exact computation of SED is an NP-hard problem. In recent years, some studies have attempted to leverage Graph Neural Networks (GNNs) to learn SED. However, existing GNN-based methods suffer from two significant limitations: (1) They rely on node-centric message passing, which cannot fully capture the impact of graph topology changes caused by graph edit operations. (2) They struggle to handle the asymmetry of SED, making it challenging to balance the scale differences between the input graphs and their distances in the representation space. To address these issues, this paper proposes a novel Masked Graph Distance Network (MGDN) for accurate SED approximation. First, MGDN utilizes a unified graph encoder to perform message passing based on the original graph structure and its dual hypergraph, effectively capturing the impact of node- and edge-specific edits. Then, we introduce an adaptive graph masking module that flexibly assigns masking scores to nodes and edges in the target graph to address asymmetry. Using multi-head masking, we re-encode the input graphs to focus on substructures relevant to SED computation. Finally, a multi-view predictor is employed at the graph level to approximate the SED, enhancing estimation accuracy by integrating information from multiple perspectives. Extensive experiments on nine benchmark datasets demonstrate that MGDN significantly outperforms state-of-the-art methods.
Seeing Sequences like Humans: Pattern Classification Driven Time-Series Forecasting via Vision Language Models
Time-series forecasting is critical to highly data-dependent domains such as energy, healthcare, and transportation. Although Large Language Models have recently been explored for this task, their performance is hindered by a modality gap: numerical sequences poorly align with text-based inputs, and direct alignment often introduces noise. In contrast, human experts rarely predict directly from numbers; they first inspect line charts to recognize overall patterns and then apply simple models for forecasting. Inspired by this workflow, we propose VisMoE, a Vision-Language-Model-driven Mixture-of-Experts framework. In VisMoE, Each sequence is transformed into a line-chart image, enabling a VLM to classify it into distinct temporal regimes. Based on this classification, VisMoE routes the sequence to lightweight specialized experts operating alongside a global predictor, whose outputs are fused for final forecasts. This human-inspired design preserves semantic understanding, reduces modality misalignment, and improves computational efficiency. Extensive experiments across multiple benchmarks demonstrate that VisMoE achieves state-of-the-art forecasting accuracy while remaining highly efficient. Our code is available at https://github.com/Liu905169/VisMoE.
Enabling Group Fairness in Machine Unlearning via Distribution Correction
Machine unlearning is a recently developed technique to remove the influence of specific data points from a trained model. However, most machine unlearning approaches focus on preserving model performance, which may inadvertently introduce bias. From a preliminary study, we found that a model can become more biased after applying unlearning algorithms. To address this issue, we propose FMU (Fair Machine Unlearning), which ensures group fairness throughout the unlearning process. Specifically, FMU first withdraws the model updates for batches containing unlearning requests to protect privacy. It then removes model updates from additional sampled batches that carry reversed sensitive attributes linked to the same requests, mitigating newly introduced bias. Our experiments compare FMU with standard machine unlearning baselines and one fair unlearning method. Results show that FMU achieves superior fairness while maintaining privacy and delivering accuracy comparable to full retraining. Furthermore, FMU remains effective across diverse unlearning requests involving varying data distributions. Being orthogonal to specific unlearning and debiasing techniques, FMU provides a flexible foundation for more advanced fair machine unlearning research.
Enhancing Recommendation with Reliable Multi-profile Alignment and Collaborative-aware Contrastive Learning
Recent studies have explored the integration of Large Language Models (LLMs) into recommender systems to enhance the semantic understanding of users and items. While traditional collaborative filtering approaches primarily rely on interaction histories, LLM-enhanced methods attempt to construct comprehensive profiles by leveraging descriptive metadata and user-generated reviews. The semantic representations of these profiles are then aligned with recommender embeddings to enhance the performance of recommender systems. However, the effectiveness of such approaches heavily depends on the quality of the generated profiles, which face several critical challenges: inaccurate profiles, insufficient information and information gap between semantic representations and recommender embeddings. To tackle these challenges, we propose a novel framework with reliable multi-profile alignment and collaborative-aware contrastive learning. Specifically, we introduce a profile generation method combining Chain-of-Thought(CoT) prompting and self-reflection to address the issue of inaccurate profiles. To alleviate the problem of insufficient information, we introduce an interactive profile construction mechanism that aggregates and summarizes common characteristics from users' and items' neighbors in the user-item graph. To bridge the information gap between semantic representations and recommender embeddings, we propose interactive information fusion(IIF), which aggregates semantic representations from neighbors and employs supervised contrastive learning to guide representation learning. Furthermore, we propose a multi-profile alignment framework that aligns recommender embeddings with both basic profiles and interactive profiles through deduplicated contrastive objectives, facilitating effective semantic-behavioral alignment. Extensive experiments on three public datasets and six base recommenders demonstrate that our method consistently outperforms strong LLM-based baselines, achieving an average improvement of 2.93% in Recall@20 and 2.64% in NDCG@20.
Structure-Attribute Transformations with Markov Chain Boost Graph Domain Adaptation
Graph domain adaptation has gained significant attention in label-scarce scenarios across different graph domains. Traditional approaches to graph domain adaptation primarily focus on transforming node attributes over raw graph structures and aligning the distributions of the transformed node features across networks. However, these methods often struggle with the underlying structural heterogeneity between distinct graph domains, which leads to suboptimal distribution alignment. To address this limitation, we propose Structure-Attribute Transformation with Markov Chain (SATMC), a novel framework that sequentially aligns distributions across networks via both graph structure and attribute transformations. To mitigate the negative influence of domain-private information and further enhance the model's generalization, SATMC introduces a private domain information reduction mechanism and an empirical Wasserstein distance. Theoretical proofs suggest that SATMC can achieve a tighter error bound for cross-network node classification compared to existing graph domain adaptation methods. Extensive experiments on nine pairs of publicly available cross-domain datasets show that SATMC outperforms state-of-the-art methods in the cross-network node classification task. The code is available at https://github.com/GiantZhangYT/SATMC.
Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning
As the volume and diversity of available datasets continue to increase, assessing data quality has become crucial for reliable and efficient Machine Learning analytics. A modern, game-theoretic approach for evaluating data quality is the notion of Data Shapley which quantifies the value of individual data points within a dataset. State-of-the-art methods to scale the NP-hard Shapley computation also face severe challenges when applied to large-scale datasets, limiting their practical use. In this work, we present a Data Shapley approach to identify a dataset's high-quality data tuples, Chunked Data Shapley (C-DaSh). C-DaSh scalably divides the dataset into manageable chunks and estimates the contribution of each chunk using optimized subset selection and single-iteration stochastic gradient descent. This approach drastically reduces computation time while preserving high quality results. We empirically benchmark our method on diverse real-world classification and regression tasks, demonstrating that C-DaSh outperforms existing Shapley approximations in both computational efficiency (achieving speedups between 80× - 2300×) and accuracy in detecting low-quality data regions. Our method enables practical measurement of dataset quality on large tabular datasets, supporting both classification and regression pipelines.
Harnessing Large Language Models for Group POI Recommendations
The rapid proliferation of Location-Based Social Networks (LBSNs) has underscored the importance of Point-of-Interest (POI) recommendation systems in enhancing user experiences. While individual POI recommendation methods leverage users' check-in histories to provide personalized suggestions, they struggle to address scenarios requiring group decision-making. Group POI recommendation systems aim to satisfy the collective preferences of multiple users, but existing approaches face two major challenges: diverse group preferences and extreme data sparsity in group check-in data. To overcome these challenges, we propose LLMGPR, a novel framework that leverages large language models (LLMs) for group POI recommendations. LLMGPR introduces semantic-enhanced POI tokens and incorporates rich contextual information to model the diverse and complex dynamics of group decision-making. To further enhance its capabilities, we developed a sequencing adapter using Quantized Low-Rank Adaptation (QLoRA), which aligns LLMs with group POI recommendation tasks. To address the issue of sparse group check-in data, LLMGPR employs an aggregation adapter that integrates individual representations into meaningful group representations. Additionally, a self-supervised learning (SSL) task is designed to predict the purposes of check-in sequences (e.g., business trips and family vacations), thereby enriching group representations with deeper semantic insights. Extensive experiments demonstrate the effectiveness of LLMGPR, showcasing its ability to significantly enhance the accuracy and robustness of group POI recommendations.
ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
Controlling speaking style in text-to-speech (TTS) systems has become a growing focus in both academia and industry. While many existing approaches rely on reference audio to guide style generation, such methods are often impractical due to privacy concerns and limited accessibility. More recently, large language models (LLMs) have been used to control speaking style through natural language prompts; however, their high computational cost, lack of interpretability, and sensitivity to prompt phrasing limit their applicability in real-time and resource-constrained environments. In this work, we propose ParaStyleTTS, a lightweight and interpretable TTS framework that enables expressive style control from text prompts alone. ParaStyleTTS features a novel two-level style adaptation architecture that separates prosodic and paralinguistic speech style modeling. It allows fine-grained and robust control over factors such as emotion, gender, and age. Unlike LLM-based methods, ParaStyleTTS maintains consistent style realization across varied prompt formulations and is well-suited for real-world applications, including on-device and low-resource deployment. Experimental results show that ParaStyleTTS generates high-quality speech with performance comparable to state-of-the-art LLM-based systems while being 30x faster, using 8x fewer parameters, and requiring 2.5x less CUDA memory. Moreover, ParaStyleTTS exhibits superior robustness and controllability over paralinguistic speaking styles, providing a practical and efficient solution for style-controllable text-to-speech generation. Demo can be found at https://parastyletts.github.io/ParaStyleTTS_Demo/. Code can be found at https://github.com/haoweilou/ParaStyleTTS.
Dual Denoising Diffusion Model for Session-based Social Recommendation
Session-based Social Recommendation (SSR) enhances item recommendations by incorporating both session interactions and social network data. Despite recent progress, existing SSR methods-primarily based on Graph Neural Networks-are highly susceptible to session noise (irrelevant or unintentional interactions) and social noise (misleading signals from connected users). Prior denoising strategies often rely on heuristic resampling or reweighting techniques, which lack generalizability and robustness across diverse datasets. In this work, we explore a novel direction by introducing diffusion models for denoising in SSR. However, applying diffusion to SSR presents unique challenges due to heterogeneous data modalities, incompatible noise patterns, and the absence of semantic guidance during the reverse process. To overcome these challenges, we propose D3MRec, a Dual Denoising Diffusion Model specifically designed for SSR. D3MRec employs a dual-branch architecture that independently models session sequences and social graphs, applying denoising diffusion in their respective hidden representation spaces. This decoupled design preserves the structural integrity of each modality while enabling modality-specific denoising. Moreover, we introduce cross-modal guidance by leveraging collaborative signals from the other branch during the reverse diffusion process, enhancing alignment between session intents and social preferences. The dual denoising processes not only mitigate noise within each modality but also serve as mutual priors, facilitating robust and consistent representation learning across modalities. Extensive experiments on multiple benchmarks show that D3MRec significantly outperforms state-of-the-art models, particularly under noisy conditions, demonstrating its effectiveness and robustness.
Collaborative Interest Mining Network for Knowledge Graph-based Recommendation
Knowledge graphs contain rich semantic information and have been widely applied in recommender systems. However, most existing knowledge graph-based recommendation methods primarily focus on modeling item-side semantics and overlook the critical role of knowledge graphs in enhancing user representations. In this work, we propose a novel recommendation model called Collaborative Interest Mining Network for Knowledge Graph-based Recommendation (CIMNK), which leverages the knowledge graph to mine collaborative interest similarity (i.e., the similarity between users who share the same interests), thereby enhancing the quality of user embeddings. Specifically, CIMNK first constructs a user-interest graph by performing fine-grained filtering over entities and distinguishing between different relation types, to explicitly represent direct associations between users and interest entities. Subsequently, CIMNK introduces the Relation-aware Collaborative Interest Mining Module (RCIM), which conducts graph representation learning on the user-interest graph to mine and integrate collaborative interest information across different relation types. Finally, we design an interest-aware loss function to supervise the learning of collaborative interest similarity. Extensive experiments on three public benchmark datasets demonstrate that CIMNK outperforms state-of-the-art methods. The implementations are available at: https://github.com/JieLuoRoger/CIMNK-Pytorch.
LEI: Reinforced Multi-Object Cache Admission
The Hot Object Cache (HOC) admission policy is one of the core technologies in Content Delivery Network (CDN) cache management and plays a critical role in the broader Computing Power Network (CPN) environment. It primarily employs heuristic- and threshold-based methods. These methods are simple and efficient but fail both to capture latent dependencies among requested objects and to support informed admission decisions. We propose a reinforcement learning approach called LEI for multi-object CDN admission (i.e., sequences of request objects across multiple time instants, which include multiple objects requested at each instant). LEI integrates three key components: a Multi-Object Time Encoding (MO-TE) mechanism, which uses density-based representations to model multi-object temporal sequences; a Buffer Mechanism, which addresses the reward-latency issue by buffering historical decision information; and a dual-head network architecture with a two-stage training strategy, which enhances the model's admission decision-making capability and long-term stability. Experiments on four public datasets demonstrate that LEI improves the HOC request hit rate by 5-12% and the HOC byte hit rate by 5-42% compared with state-of-the-art methods. It also reduces disk cache (DC) read and write rates.
MARM: Unlocking the Recommendation Cache Scaling-Law through Memory Augmentation and Scalable Complexity
Scaling-law has guided the language model design for past years, e.g., GPTs, enabling the estimation of expected model performance with respect to the size of learnable parameters and the scale of training samples. It is worth noting that the scaling laws of NLP cannot be directly applied to recommendation systems due to the following reasons: (1) The amount of training samples and model parameters is typically not the bottleneck for the model. Our recommendation system can generate over 50 billion user samples daily, and such a massive amount of training data can easily allow our model parameters to exceed 200 billion, surpassing many LLMs (about 100B). (2) It is essential to control FLOPs carefully in recommendation system. In training, we need to process a vast number of recommendation samples every day. During online inference, we must respond within milliseconds (LLMs usually take a few seconds). Considering the above differences with LLM, we can conclude that: for a RecSys model, compared to model parameters, the FLOPs is a more expensive factor that requires careful control. In this paper, we propose our milestone work, MARM (Memory Augmented Recommendation Model), which explores a new cache scaling-law successfully. By caching part of complex module calculation results, our MARM extends the single-layer attention-based sequences interests modeling module to a multi-layer setting with minor inference FLOPs cost (i.e, module time complexity O(n2*d) -> O(n*d)). Equipped with the cache idea, our MARM solution significantly overcomes computational bottlenecks and can seamlessly empower all interest extraction modules for user sequences, and even other models. To support our MARM, we construct a 60TB cache storage center for offline training and online serving. Comprehensive experiment results show that our MARM brings offline 0.43% GAUC improvements and online 2.079% play-time per user gains. Our MARM has been deployed on a real-world short-video platform, serving tens of millions of users daily.
MetaCAN: Improving Generalizability of Few-shot Anomaly Detection with Meta-learning
Few-shot Anomaly Detection (AD) for images aims to detect anomalies with few-shot normal samples from the target dataset. It is a crucial task when only few samples can be obtained, and it is challenging since it needs to be generalized to different domains. Existing methods try to enhance the generalizability of AD by incorporating large vision-language models (LVLMs).However, how to transform category semantic information in LVLMs into anomaly information to improve the generalizability of AD remains a challenge facing existing methods.To address the challenge, we propose a few-shot AD method called MetaCAN, a novel category-to-anomaly network trained with AD meta-learning scheme based on an LVLM. Specifically, MetaCAN constructs the auxiliary training data and multiple tasks based on different categories to perform AD meta-learning, which ensures that the optimization toward the achievement of optimal anomaly detection across all categories. Moreover, MetaCAN introduces an image-image anomaly discriminator and an image-text anomaly detector to fully exploit the powerful multimodal semantic representations during auxiliary training. Once trained on auxiliary datasets, MetaCAN can be applied directly to other target datasets without retraining. Extensive experiments on six real-world datasets demonstrate that MetaCAN achieves state-of-the-art performance on cross-domain and cross-category anomaly detection tasks compared with existing methods.
Multimodal Sentiment Analysis with Multi-Perspective Thinking via Large Multimodal Models
Multimodal sentiment analysis (MSA) is attracting increasing attention from researchers. Existing studies on MSA typically rely on surface-level feature extraction and fusion that can be directly obtained from multimodal data, which may often ignore the underlying semantic connection between images and texts. Recent progress in large multimodal models (LMMs) has demonstrated their impressive reasoning abilities, which can be leveraged to improve traditional MSA approaches by providing a deeper understanding of the sematic connection of the modalities. Toward this issue, in this paper, we propose a novel framework called MPT that combines traditional MSA approaches with Multi-Perspective Thinking from LMMs to improve prediction outcomes. Specifically, MPT instructs the traditional multimodal deep learning models to understand multiple-perspective rationales for different sentiment polarities, augmenting its knowledge base and enhancing its ability to make more accurate predictions. Extensive experiments on four refined datasets show that MPT can not only deliver better performance compared with existing methods, but also demonstrate good cross-modal understanding ability for recognizing user sentiment. The codes and datasets can be accessed here: https://github.com/RMJHQwQ/MPT.
Reconsidering the Performance of GAE in Link Prediction
Recent advancements in graph neural networks (GNNs) for link prediction have introduced sophisticated training techniques and model architectures. However, reliance on outdated baselines may exaggerate the benefits of these new approaches. To tackle this issue, we systematically explore Graph Autoencoders (GAEs) by applying model-agnostic tricks in recent methods and tuning hyperparameters. We find that a well-tuned GAE can match the performance of recent sophisticated models while offering superior computational efficiency on widely used link prediction benchmarks. Our approach delivers substantial performance gains on datasets where structural information dominates and feature data is limited. Specifically, our GAE achieves a state-of-the-art (SOTA) Hits@100 score of 78.41% on the ogbl-ppa dataset. Furthermore, we examine the impact of various tricks to uncover the reasons behind our success and to guide the design of future methods. Our study emphasizes the critical need to update baselines for a more accurate assessment of progress in GNNs for link prediction. Our code is available at https://github.com/GraphPKU/Refined-GAE.
As Good as It KAN Get: High-Fidelity Audio Representation
Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-Spectral Distance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5~s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks.
Tight Bounds for Jensen's Gap with Applications to Variational Inference
Since its original formulation, Jensen's inequality has played a fundamental role across mathematics, statistics, and machine learning, with its probabilistic version highlighting the nonnegativity of the so-called Jensen's gap, i.e., the difference between the expectation of a convex function and the function at the expectation. Of particular importance is the case when the function is logarithmic, as this setting underpins many applications in variational inference, where the term variational gap is often used interchangeably. Recent research has focused on estimating the size of Jensen's gap and establishing tight lower and upper bounds under various assumptions on the underlying function and distribution, driven by practical challenges such as the intractability of log-likelihood in graphical models like variational autoencoders (VAEs). In this paper, we propose new, general bounds for Jensen's gap that accommodate a broad range of assumptions on both the function and the random variable, with special attention to exponential and logarithmic cases. We provide both analytical and empirical evidence for the performance of our method. Furthermore, we relate our bounds to the PAC-Bayes framework, providing new insights into generalization performance in probabilistic models.
FinD3: A Dual 3D State Space Model with Dynamic Hypergraph for Financial Stock Prediction
The financial market plays a crucial role in the modern economy by influencing capital allocation, corporate valuation, and investor behavior. However, its complex dependencies and non-stationary dynamics present significant challenges for financial stock prediction. Previous predictive approaches are typically categorized into Univariate Time Series (UTS) and Multivariate Time Series (MTS) paradigms. UTS methods overlook both cross-feature and cross-stock influences, while MTS methods can only capture one of these simultaneously. Although some recent approaches claim to model 3D Multivariate Time Series (3D-MTS) dependencies, they often discard substantial information and fail to capture the dynamics of the stock market. To address these limitations, we propose FinD3, a Financial 3D model using Dual cubic state spaces and Dynamic hypergraphs. To extract the inherent complex relationships in 3D-MTS, we propose a novel Dual Cubic State Space Model (DCSSM) to capture both cross-feature and cross-stock patterns. Furthermore, to more accurately reflect the dynamics of the stock market, we present an Evolving Hypergraph Attention (EHA) module, which captures dynamic changes in financial markets and updates the hypergraph based on a priori hypergraph. Experimental results demonstrate that FinD3 achieves state-of-the-art performance in quantitative trading performance on two real-world stock market datasets, offering a promising solution to practical quantitative trading challenges. The code is available at: https://github.com/decisionintelligence/FinD3.
Dense Retrieval for Aggregated Search
To satisfy users' diverse information needs, the aggregated search systems need to integrate heterogeneous results, with rich but different structural information, from a variety of verticals, such as news search, video search, and product search. A key challenge in aggregated search is to effectively and efficiently retrieve the most relevant results among a large number of heterogeneous information from different verticals. With the development of deep learning and pre-trained language models (PLMs), many researchers resort to Dense Retrieval (DR) models for a unified, efficient embedding-based retrieval and a better retrieval performance. However, existing dense retrieval models have limitations in: 1) capturing the structural information of search results ; and 2) generalizing across different vertical domains where the search results have different or even unseen structures. In this paper, we aim to tackle these limitations, and propose an effective and efficient dense retrieval model for aggregated search. Specifically, we utilize a deep prompt-tuning technique to make the pre-training model easily applied to downstream vertical search tasks. To capture the structural knowledge, we design a Graph Neural Network (GNN)-based structure prompt, to prompt how text segments are organized in the vanilla semi-structured data. We further incorporate a distributional prompt to model the theme of each domain, and enhance cross-domain generalization. Extensive experiments on the real-world data collected from the WeChat Search demonstrate that for aggregated search tasks, our models can achieve better performance over existing retrieval models, and have the superior ability to generalize to the various or even unseen vertical search tasks.
Usefulness and Diminishing Returns: Evaluating Social Information in Recommender Systems
Social recommendation, which leverages users' social information to predict users' preferences, is a popular branch of recommender systems. Many existing studies have attempted to advance the performance of collaborative filtering methods by leveraging the user-user matrix to enhance user embedding learning with user's social connections. While the existing social recommender systems have demonstrated good performance in various recommendation tasks, the extent of social information usefulness in recommender systems remains unclear. This paper addresses the research gap by designing experiments to answer three research questions: (i) How useful is social information in varying user-item data sparsity? (ii) How much social information do the existing social recommendation models use? (iii) How valuable is social information for cold-start situations? Working towards answering the research questions, we introduce evaluation metrics to estimate the utilization of social information in the existing social recommendation models. We conducted experiments on three publicly available social recommendation datasets, and our results showed that there are diminishing returns when applying social information in recommender systems.
A Cost-Effective Framework to Evaluate LLM-Generated Relevance Judgements
Large Language Models (LLMs) hugely impacted many research fields, including Information Retrieval (IR), where they are used for many sub-tasks, such as query rewriting and retrieval augmented generation. At the same time, the research community is investigating whether and how to use LLMs to support, or even replace, humans to generate relevance judgments. Indeed, generating relevance judgements automatically - or integrating an LLM in the annotation process - would allow us to improve the number of evaluation collections, also for scenarios where the annotation process is particularly challenging. To validate relevance judgements produced by an LLM they are compared with human-made relevance judgements, measuring the inter-assessor agreement between the human and the LLM. Our work introduces an innovative framework for estimating the quality of LLM-generated relevance judgments, providing statistical guarantees while minimizing human involvement. The proposed framework allows to: i) estimate the quality of LLM-generated relevance judgments with a defined confidence while minimizing human involvement; and ii) estimate the quality of LLM-generated relevance judgments with a fixed budget while providing bounds on the estimate. Our experimental results on three well-known IR collections using multiple LLMs as assessors show it is sufficient to assess 16% of the LLM-generated relevance judgments to estimate the LLM's performance with a 95% confidence.
Reverse Chain-of-Thought and Causal Path Verification: A Modular Plugin for Aligning LLMs with Knowledge Graphs
Large language models (LLMs) exhibit strong language understanding capabilities, but encounter challenges when integrating structured knowledge from knowledge graphs (KGs) for complex reasoning tasks such as knowledge graph question answering (KGQA). Existing methods often rely on prompt engineering or fixed templates, which obscure the relational structure and limit generalization. To address these limitations, this paper introduces the Reverse Chain-of-Thought (R-CoT) and Causal Path Verification Plugin, a modular framework that reconstructs retrieved KG triples into reverse chains of sub-questions. Each reasoning step is aligned with a supporting triple, forming interpretable multi-hop paths. In particular, Semantic Causal Scoring (SCS) module is further incorporated to evaluate the causal alignment between each reverse sub-question and the original question through dynamic semantic vector matching. The SCS design avoids frequent interactions with LLMs and effectively filters irrelevant or unsupported reasoning steps. Based on the scoring results, a template-free, model-agnostic R-CoT input format is constructed as a semi-structured sequence. This design preserves the KG structure in natural language form and enables seamless integration with standard LLMs without fine-tuning. Experimental results demonstrate that the R-CoT Plugin consistently improves factual alignment, enhances reasoning stability, and outperforms conventional prompt-based methods in both accuracy and coherence.
Towards Adaptive Personalized Conversational Information Retrieval
Personalized conversational information retrieval (CIR) systems aim to satisfy users' complex information needs through multi-turn interactions by considering user profiles. However, not all search queries require personalization. The challenge lies in appropriately incorporating personalization elements into search when needed. Most existing studies implicitly incorporate users' personal information and conversational context using large language models without distinguishing the specific requirements for each query turn. Such a ''one-size-fits-all'' personalization strategy might lead to sub-optimal results. In this paper, we propose an adaptive personalization method, in which we first identify the required personalization level for a query and integrate personalized queries with other query reformulations to produce various enhanced queries. Then, we design a personalization-aware ranking fusion approach to assign fusion weights dynamically to different reformulated queries, depending on the required personalization level. The proposed Adaptive Personalized Conversational Information Retrieval framework APCIR is evaluated on two TREC iKAT datasets. The results confirm the effectiveness of adaptive personalization of APCIR by outperforming state-of-the-art methods.
EFU: Enforcing Federated Unlearning via Functional Encryption
Federated unlearning (FU) algorithms allow clients in federated settings to exercise their right to be forgotten by removing the influence of their data from a collaboratively trained model. Existing FU methods maintain data privacy by performing unlearning locally on the client-side and sending targeted updates to the server without exposing forgotten data; yet they often rely on server-side cooperation, revealing the client's intent and identity without enforcement guarantees - compromising autonomy and unlearning privacy. In this work, we propose EFU (Enforced Federated Unlearning), a cryptographically enforced FU framework that enables clients to initiate unlearning while concealing its occurrence from the server. Specifically, EFU leverages functional encryption to bind encrypted updates to specific aggregation functions, ensuring the server can neither perform unauthorized computations nor detect or skip unlearning requests. To further mask behavioral and parameter shifts in the aggregated model, we incorporate auxiliary unlearning losses based on adversarial examples and parameter importance regularization. Extensive experiments show that EFU achieves near-random accuracy on forgotten data while maintaining performance comparable to full retraining across datasets and neural architectures - all while concealing unlearning intent from the server. Furthermore, we demonstrate that EFU is agnostic to the underlying unlearning algorithm, enabling secure, function-hiding, and verifiable unlearning for any client-side FU mechanism that issues targeted updates.
Learning Optimal Personalised Reservation Prices in Impression Ad Auctions with Mixture Density Networks
Reservation prices have proven effective in boosting revenue in Generalised Second Price (GSP) auctions, particularly in cost-per-click (CPC) settings. However, in domains like music streaming, where ads are consumed passively without user clicks, a cost-per-impression (CPM) model is more appropriate. Additionally, in the music streaming domain, user intent is typically unknown, unlike in sponsored search, making it essential to optimally leverage all available user and contextual information when setting prices. This paper addresses the challenge of optimising reservation prices in GSP auctions with CPM pricing, adopting a personalised approach that accounts for both user- and advertiser-specific factors. Using dataset of 100,000 auctions from a major music streaming service, we determine such optimal prices. To achieve this, we first derive the symmetric Nash equilibrium for GSP auctions in a CPM context. We then introduce a Deep Neural Network-based mixture density model that incorporates this equilibrium into its loss. This model captures advertisers' diverse preferences by learning directly from bidding data. We show how this approach enables the computation of personalised prices for both users and advertisers, boosting auction revenue by an average of +4% across ten markets. Our study further highlights the impact of market competitiveness and advertiser preference heterogeneity on revenue gains, showing that personalised pricing greatly enhances auction performance.
Improving Text Embedding Models with Positive-aware Hard-negative Mining
Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. One of the challenging aspects of fine-tuning embedding models is selecting high quality hard-negative passages for contrastive learning. In this paper we introduce a family of positive-aware mining methods that use the positive relevance score as an anchor for false negative removal. Our methods are simple, effective, scalable, and lead to faster training and more accurate retrieval models. We provide an ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We further demonstrate the efficacy of our proposed mining methods at scale with the NV-Retriever-v1 model, which scored 60.9 on the MTEB Retrieval (BEIR) benchmark and placed 1st upon its publication.
Latent Variable Modeling for Robust Causal Effect Estimation
Latent variable models provide a powerful framework for incorporating and inferring unobserved factors in observational data. In causal inference, they help account for hidden factors influencing treatment or outcome, thereby addressing challenges posed by missing or unmeasured covariates. This paper proposes a new framework that integrates latent variable modeling into the double machine learning (DML) paradigm to enable robust causal effect estimation in the presence of such hidden factors. We consider two scenarios: one where a latent variable affects only the outcome, and another where it may influence both treatment and outcome. To ensure tractability, we incorporate latent variables only in the second stage of DML, separating representation learning from latent inference. We demonstrate the robustness and effectiveness of our method through extensive experiments on both synthetic and real-world datasets.
Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
Robustness is a critical requirement for deploying machine learning models in safety-sensitive domains, where even imperceptible input perturbations can lead to hazardous outcomes. However, existing robustness assessment techniques prior to deployment often face a trade-off between computational feasibility and measurement precision, limiting their effectiveness in practice. To address these limitations, we provide a systematic comparative study of prevailing robustness definitions and their corresponding evaluation methodologies. Building on this analysis, we propose tower robustness, which is a novel and practical concept setting out from a global perspective. Further, we provide upper and lower bounds of tower robustness, based on hypothesis testing, for quantitative evaluation, enabling more rigorous and efficient pre-deployment assessments. Through empirical investigation, we demonstrate that our approach provides reliable robustness assessments. These findings advance the systematic understanding of robustness and contribute a practical framework for enhancing the safety of machine learning models in safety-critical applications.
Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation
Medical ontology graphs map external knowledge to medical codes in electronic health records (EHRs) via structured relationships. By leveraging domain-approved connections (e.g., parent-child), predictive models can generate richer medical concept representations by incorporating contextual information from related concepts. However, existing literature primarily focuses on incorporating domain knowledge from a single ontology system, or from multiple ontology systems (e.g., diseases, drugs, and procedures) in isolation, without integrating them into a unified learning structure. Consequently, concept representation learning often remains limited to intra-ontology relationships, overlooking cross-ontology connections that could enhance the richness of healthcare representations. In this paper, we propose LINKO, a large language model (LLM)-augmented integrative ontology learning framework that leverages multiple ontology graphs simultaneously by enabling dual-axis knowledge propagation both within and across heterogeneous ontology systems to enhance medical concept representation learning. Specifically, LINKO first employs LLMs to provide a graph-retrieval-augmented initialization for ontology concept embedding, through an engineered prompt that includes concept descriptions, and is further augmented with ontology graph relations and task-specific details. Second, our method jointly learns the medical concepts in diverse ontology graphs by performing knowledge propagation in two axes: (1) intra-ontology vertical propagation across hierarchical ontology levels and (2) inter-ontology horizontal propagation within every level in parallel. Last, through extensive experiments on two public datasets, we validate the superior performance of LINKO over state-of-the-art baselines. As a plug-in encoder compatible with existing EHR predictive models, LINKO further demonstrates enhanced robustness in scenarios involving limited data availability and rare disease prediction.
PriviRec: Confidential and Decentralized Graph Filtering for Recommender Systems
Recent advances in recommender systems have shown that relying on graph filters, such as the normalized item-item adjacency matrix and the ideal low-pass filter yields competitive performance and scales better than Graph Convolutional Networks-based solutions. However, these solutions require centralizing user data, which raises concerns over data privacy, security, and the monopolization of user data by a few actors. To address those concerns, we propose PriviRec and PriviRec-k, two complementary recommendation frameworks. In PriviRec, we show that it is possible to decompose widely used filters so that they can be computed in a distributed setting using Secure Aggregation and a distributed version of the Randomized Power Method, without revealing individual users contributions. PriviRec-k extends this approach by having users securely aggregate low-rank projections of their contributions, enabling a tunable balance between communication overhead and recommendation accuracy. We demonstrate theoretically as well as experimentally on Gowalla, Yelp2018, and Amazon-Book that our methods achieve performance comparable to centralized state-of-the-art recommender systems and superior to decentralized ones, while preserving confidentiality and low communication and computational overheads.
Addressing Personalized Bias for Unbiased Learning to Rank
Unbiased learning to rank (ULTR), which aims to learn unbiased ranking models from biased user behavior logs, plays an important role in Web search. Previous research on ULTR has studied a variety of biases in users' clicks, such as position bias, presentation bias, and outlier bias. However, existing work often assumes that the behavior logs are collected from an ''average'' user, neglecting the differences between different users in their search and browsing behaviors. In this paper, we introduce personalized factors into the ULTR framework, which we term the user-aware ULTR problem. Through a formal causal analysis of this problem, we demonstrate that existing user-oblivious methods are biased when different users have different preferences over queries and personalized propensities of examining documents. To address such a personalized bias, we propose a novel user-aware inverse-propensity-score estimator for learning-to-rank objectives. Specifically, our approach models the distribution of user browsing behaviors for each query and aggregates user-weighted examination probabilities to determine propensities. We theoretically prove that the user-aware estimator is unbiased under some mild assumptions and shows lower variance compared to the straightforward way of calculating a user-dependent propensity for each impression. Finally, we empirically verify the effectiveness of our user-aware estimator by conducting extensive experiments on two semi-synthetic datasets and a real-world dataset.
TANDEM: Temporal Attention-guided Neural Differential Equations for Missingness in Time Series Classification
Handling missing data in time series classification remains a significant challenge in various domains. Traditional methods often rely on imputation, which may introduce bias or fail to capture the underlying temporal dynamics. In this paper, we propose TANDEM (Temporal Attention-guided Neural Differential Equations for Missingness), an attention-guided neural differential equation framework that effectively classifies time series data with missing values. Our approach integrates raw observation, interpolated control path, and continuous latent dynamics through a novel attention mechanism, allowing the model to focus on the most informative aspects of the data. We evaluate TANDEM on 30 benchmark datasets and a real-world medical dataset, demonstrating its superiority over existing state-of-the-art methods. Our framework not only improves classification accuracy but also provides insights into the handling of missing data, making it a valuable tool in practice.
Disentangling Complex Questions in LLMs via Multi-Hop Dependency Graphs
While Large language models (LLMs) have shown to exhibit remarkable performance in a wide range of NLP tasks, they often struggle to interpret and reason over multi-hop questions in open-domain question answering (ODQA) settings. While popular prompt approaches such as Chain-of-Thought and Plan-and-Solve facilitate more manageable questions for OQDA via task decomposition, these approaches are prone to generating erroneous and redundant intermediate steps in multi-hop queries due to limited capacity for modeling complex entity relationships. In this paper, we introduce a novel prompt approach for multi-hop QA viz., MoDeGraph (Multi-Hop Dependency Graphs), that is designed to steer LLMs to extract and model entity relationships in complex questions. MoDeGraph constructs a dependency graph from LLM-generated entity-relation triples to enable more coherent and human-like multi-step reasoning. Experimental results in knowledge-intensive tasks for multi-hop QA demonstrate our approach produces more coherent and faithful reasoning chains as well as consistent increase in QA performance across several benchmark datasets.
Multimodal Sentiment Analysis via Progressive Fusion of Audio-Visual Affective Descriptions
Multimodal Sentiment Analysis (MSA) holds significant research value in the fields of intelligent human-computer interaction and affective computing. Although existing MSA approaches have made considerable progress, challenges remain in identifying subtle emotional distinctions within audio and visual expressions. In particular, conventional fusion methods have not effectively addressed the difficulty of integrating heterogeneous modality information. To tackle these challenges, we propose a progressive fusion framework based on audio-visual affective descriptions for MSA. Specifically, we design an audio-visual emotional description generator that transforms raw audiovisual data into textual emotional descriptions, thereby effectively highlighting affective features. Subsequently, this emotional description is integrated with the original multimodal features to obtain a richer feature representation. Building upon this, we introduce a three-stage progressive fusion architecture. First, we employ cross-modal transformers to facilitate interactions among modalities and to learn inter-modal dependencies. Second, a gated fusion mechanism is incorporated to effectively eliminate redundant information and further promote interaction and compatibility among features. Finally, an attention mechanism is utilized to dynamically adjust the weights of features from different modalities, enabling effective multimodal sentiment information fusion. Experimental results on widely used sentiment analysis benchmark datasets, including MOSI, MOSEI, and CH-SIMS, underscore significant enhancements compared to state-of-the-art models.
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis
LLM-based agents have emerged as transformative tools capable of executing complex tasks through iterative planning and action, achieving significant advancements in understanding and addressing user needs. Yet, their effectiveness remains limited in specialized domains such as mental health diagnosis, where they underperform compared to general applications. Current approaches to integrating diagnostic capabilities into LLMs rely on scarce, highly sensitive mental health datasets, which are challenging to acquire. These methods also fail to emulate clinicians' proactive inquiry skills, lack multi-turn conversational comprehension, and struggle to align outputs with expert clinical reasoning. To address these gaps, we propose DSM5AgentFlow, the first LLM-based agent workflow designed to autonomously generate DSM-5 Level-1 diagnostic questionnaires. By simulating therapist-client dialogues with specific client profiles, the framework delivers transparent, step-by-step disorder predictions, producing explainable and trustworthy results. This workflow serves as a complementary tool for mental health diagnosis, ensuring adherence to ethical and legal standards. Through comprehensive experiments, we evaluate leading LLMs across three critical dimensions: conversational realism, diagnostic accuracy, and explainability. Our datasets and implementations are fully open-sourced.
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Large Language Models (LLMs) possess remarkable generalization capabilities but struggle with multi-task adaptation, particularly in balancing knowledge retention with task-specific specialization. Conventional fine-tuning methods suffer from catastrophic forgetting and substantial resource consumption, while existing parameter-efficient methods perform suboptimally in complex multi-task scenarios. To address this, we propose Contextual Attention Modulation (CAM), a novel mechanism that dynamically modulates the representations of self-attention modules in LLMs. CAM enhances task-specific features while preserving general knowledge, thereby facilitating more effective and efficient adaptation. For effective multi-task adaptation, CAM is integrated into our Hybrid Contextual Attention Modulation (HyCAM) framework, which combines a shared, full-parameter CAM module with multiple specialized, lightweight CAM modules, enhanced by a dynamic routing strategy for adaptive knowledge fusion. Extensive experiments on heterogeneous tasks, including question answering, code generation, and logical reasoning, demonstrate that our approach significantly outperforms existing approaches, achieving an average performance improvement of 3.65%. The implemented code and data are available to ease reproducibility.
MRCLQR: A Framework for Logical Query Reasoning Based on Multi-information Relation Constraints
The Knowledge Graph logical reasoning task faces a dual challenge of insufficient semantic coverage from type information and missing structural information from relations. Although type annotations provide semantic priors for entities, their coarse-grained features cannot comprehensively characterize entity attributes; conversely, relational structure can enhance semantic representation, but the incompleteness of edges in real-world graphs limits modeling when relying on a single information source. To address these issues, we propose MRCLQR (Multi-information Relation Constraint-based Logical Query Reasoning), a framework with three core innovations: (1) an Information Semantic Alignment module based on contrastive learning, which achieves cross-modal semantic collaboration via entity-type-structure pairing; (2) a Constraint-aware Relation Encoding method that decomposes relation semantics into domain aggregation features, relation ontology semantics, and range constraint features; and (3) Neural-Symbolic Operators guided by domain constraints, which narrow the reasoning space through a constraint-aware attention mechanism. Experiments on FB15k, FB15k-237, and NELL-995 demonstrate that MRCLQR achieves average MRR scores of 35.8%, 16.2%, and 19.6%, respectively improving over the strongest baselines by 0.5%, 0.2%, and 0.2% --- and exhibits an 8.0% average gain on complex queries involving negation. Ablation studies validate the effectiveness of multi-source collaboration and the curriculum learning strategy. This work offers a novel paradigm for heterogeneous knowledge fusion and logical query reasoning.
Revisiting Long-Tailed Learning: Insights from an Architectural Perspective
Long-Tailed (LT) recognition has been widely studied to tackle the challenge of imbalanced data distributions in real-world applications. However, the design of neural architectures for LT settings has received limited attention, despite evidence showing that architecture choices can substantially affect performance. This paper aims to bridge the gap between LT challenges and neural network design by providing an in-depth analysis of how various architectures influence LT performance. Specifically, we systematically examine the effects of key network components on LT handling, such as topology, convolutions, and activation functions. Based on these observations, we propose two convolutional operations optimized for improved performance. Recognizing that operation interactions are also crucial to network effectiveness, we apply Neural Architecture Search (NAS) to facilitate efficient exploration. We propose LT-DARTS, a NAS method with a novel search space and search strategy specifically designed for LT data. Experimental results demonstrate that our approach consistently outperforms existing architectures across multiple LT datasets, achieving parameter-efficient, state-of-the-art results when integrated with current LT methods.
Temporal Distance-aware Subgoal Generation for Offline Hierarchical Reinforcement Learning
Efficient subgoal generation is essential in offline Hierarchical Re- inforcement Learning (HRL) for tackling long-horizon and sparse- reward tasks. Existing approaches often struggle with redundant and inefficient subgoal candidates and fail to maintain meaningful temporal relationships due to fixed-step subgoal sampling. To ad- dress these issues, we propose Temporal Distance-Aware Subgoal Generation (TDSG), a novel framework leveraging pre-trained Tem- poral Distance (TD) representations. TDSG identifies a compact set of anchor states in the TD representation space. These states, evenly spaced at consistent temporal distance intervals and collectively covering all states in the dataset while comprising less than 1% of the entire dataset, serve as the training targets for subgoal gener- ation. This ensures efficient and temporally consistent high-level policy learning. Furthermore, the low-level policy leverages intrin- sic rewards derived from the alignment between current states and subgoals in the TD representation space, enabling effective learning even under sparse-reward conditions. Experimental results demon- strate that TDSG achieves consistent performance improvement over prior offline HRL methods across numeric and visual environ- ments. Our code is available at https://github.com/Ptaegeon/TDSG.git
How Fair is FAIR? Understanding LOD Cloud FAIRness Through Correlation Patterns
While the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) and data quality dimensions are widely used to evaluate Linked Data, their interdependencies remain largely unexplored. This paper is grounded on a systematic integration of these two frameworks by mapping data quality dimensions to FAIR sub-principles, revealing how individual features---such as endpoint availability, metadata richness, or use of standard vocabularies---can simultaneously contribute to multiple FAIR goals. Building on this mapping, this paper reports a large-scale, data-driven, longitudinal study of 1,445 datasets from the LOD Cloud extending KGHeartBeat, an open-source quality assessment framework. This paper quantifies FAIRness at the sub-principle level and computes correlation patterns across five temporal snapshots and nine topical domains. The reported findings reveal that most correlations are positive and statistically significant but vary across time and domain, with only a few stable or persistent relationships. Strong inter-principle correlations-such as those linking metadata standards and security transparency-emerge over time, while intra-principle coherence is often weak. These insights offer concrete guidance for improving FAIR compliance, highlight the importance of domain-aware evaluation, and support the development of more holistic and reproducible FAIR assessment strategies for Linked Data ecosystems.
Dialogues Aspect-based Sentiment Quadruple Extraction via Structural Entropy Minimization Partitioning
Dialogues Aspect-based Sentiment Quadruple Extraction (DiaASQ) aims to extract all target-aspect-opinion-sentiment quadruples from a given multi-round, multi-participant dialogue. Existing methods typically learn word relations across entire dialogues, assuming a uniform distribution of sentiment elements. However, we find that dialogues often contain multiple semantically independent sub-dialogues without clear dependencies between them. Therefore, learning word relationships across the entire dialogue inevitably introduces additional noise into the extraction process. To address this, our method focuses on partitioning dialogues into semantically independent sub-dialogues. Achieving completeness while minimizing these sub-dialogues presents a significant challenge. Simply partitioning based on reply relationships is ineffective. Instead, we propose utilizing a structural entropy minimization algorithm to partition the dialogues. This approach aims to preserve relevant utterances while distinguishing irrelevant ones as much as possible. Furthermore, we introduce a two-step framework for quadruple extraction: first extracting individual sentiment elements at the utterance level, then matching quadruples at the sub-dialogue level. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in DiaASQ with much lower computational costs.
Data-centric Prompt Tuning for Dynamic Graphs
Dynamic graphs have attracted increasing attention due to their ability to model complex and evolving relationships in real-world scenarios. Traditional approaches typically pre-train models using dynamic link prediction and directly apply the resulting node temporal embeddings to specific downstream tasks. However, the significant differences among downstream tasks often lead to performance degradation, especially under few-shot settings. Prompt tuning has emerged as an effective solution to this problem. Existing prompting methods are often strongly coupled with specific model architectures or pretraining tasks, which makes it difficult to adapt to recent or future model designs. Moreover, their exclusive focus on modifying node or temporal features while neglecting spatial structural information leads to limited expressiveness and degraded performance. To address these limitations, we propose DDGPrompt, a data-centric prompting framework designed to effectively refine pre-trained node embeddings at the input data level, enabling better adaptability to diverse downstream tasks. We first define a unified node expression feature matrix that aggregates all relevant temporal and structural information of each node, ensuring compatibility with a wide range of dynamic graph models. Then, we introduce three prompt matrices (temporal bias, edge weight, and feature mask) to adjust the feature matrix completely, achieving task-specific adaptation of node embeddings. We evaluate DDGPrompt under a strict few-shot setting on four public dynamic graph datasets. Experimental results demonstrate that our method significantly outperforms traditional methods and prompting approaches in scenarios with limited labels and cold-start conditions.
DistillCaps: Enhancing Audio-Language Alignment in Captioning via Retrieval-Augmented Knowledge Distillation
Automated audio captioning (AAC) benefits from incorporating external context to interpret complex sounds, but doing so with retrieval-augmented generation (RAG) at inference is sometimes infeasible due to data availability or incurs significant latency and complexity. We propose DistillCaps, a novel training-time framework that leverages RAG to guide knowledge distillation for improved audio-language alignment, while lessening the reliance on retrieval during inference. In our framework, a RAG-equipped teacher model retrieves relevant textual information (e.g., similar captions) for each audio clip and uses it for training to generate context-enriched captions. Simultaneously, a student model is trained to imitate this teacher, learning to produce high-quality captions from audio alone. We further introduce a Fast Fourier Transform (FFT) adapter in the audio encoder to inject frequency-domain features, enhancing the quality of audio representations before feeding them into the language model. The result is an efficient captioning model that retains RAG's contextual benefits without its deployment overhead. On standard AAC benchmarks (AudioCaps and Clotho), DistillCaps achieves performance competitive with or exceeding prior RAG-based systems despite using no retrieval at test time. Notably, our distilled model matches state-of-the-art captioning results under real-time settings, and when optionally allowing retrieval, it even outperforms previous models by up to 4% on the Clotho benchmark on the in-distribution setting, demonstrating the effectiveness of RAG-guided distillation for audio-language alignment. Code and dataset are available here. https://github.com/pgthinh/DistillCaps.
Oblivious Johnson--Lindenstrauss embeddings for compressed Tucker decompositions
Emphasis in the tensor literature on random embeddings (tools for low-distortion dimension reduction) for the canonical polyadic (CP) tensor decomposition has left analogous results for the more expressive Tucker decomposition comparatively lacking. This work establishes general Johnson--Lindenstrauss (JL) guarantees for the estimation of Tucker decompositions when an oblivious random embedding is applied along each mode. When these embeddings are drawn from a JL-optimal family, the decomposition can be estimated within ε relative error under restrictions on the embedding dimension that are in line with recent CP results. We implement a higher-order orthogonal iteration (HOOI) decomposition algorithm with structured random embeddings to demonstrate the practical benefits of this approach and its potential to improve the accessibility of otherwise prohibitive tensor analyses. On moderately large face image and fMRI neuroimaging datasets, empirical results show that substantial dimension reduction is possible with minimal increase in reconstruction error relative to traditional HOOI (łeq15% larger error, 50% lower computation time for large models with 50% dimension reduction along each mode). Especially for large tensors, our method outperforms traditional higher-order singular value decomposition (HOSVD) and recently proposed TensorSketch methods.
Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data
Optical Character Recognition (OCR) plays a crucial role in digitizing historical and multilingual documents, yet OCR errors - imperfect extraction of text, including character insertion, deletion, and substitution can significantly impact downstream tasks like question-answering (QA). In this work, we conduct a comprehensive analysis of how OCR-induced noise affects the performance of Multilingual QA Systems. To support this analysis, we introduce a multilingual QA dataset MultiOCR-QA, comprising 50K question-answer pairs across three languages, English, French, and German. The dataset is curated from OCR-ed historical documents, which include different levels and types of OCR noise. We then evaluate how different state-of-the-art Large Language Models (LLMs) perform under different error conditions, focusing on three major OCR error types. Our findings show that QA systems are highly prone to OCR-induced errors and perform poorly on noisy OCR text. By comparing model performance on clean versus noisy texts, we provide insights into the limitations of current approaches and emphasize the need for more noise-resilient QA systems in historical digitization contexts.
Do Recommender Systems Really Leverage Multimodal Content? A Comprehensive Analysis on Multimodal Representations for Recommendation
Multimodal Recommender Systems aim to improve recommendation accuracy by integrating heterogeneous content, such as images and textual metadata. While effective, it remains unclear whether their gains stem from true multimodal understanding or increased model complexity. This work investigates the role of multimodal item embeddings, emphasizing the semantic informativeness of the representations. Initial experiments reveal that embeddings from standard extractors (e.g., ResNet50, Sentence-Bert) enhance performance, but rely on modality-specific encoders and ad hoc fusion strategies that lack control over cross-modal alignment. To overcome these limitations, we leverage Large Vision-Language Models (LVLMs) to generate multimodal-by-design embeddings via structured prompts. This approach yields semantically aligned representations without requiring any fusion. Experiments across multiple settings show notable performance improvements. Furthermore, LVLMs embeddings offer a distinctive advantage: they can be decoded into structured textual descriptions, enabling direct assessment of their multimodal comprehension. When such descriptions are incorporated as side content into recommender systems, they improve recommendation performance, empirically validating the semantic alignment encoded in LVLMs outputs. Our study highlights the importance of semantically rich representations and positions LVLMs as a compelling foundation to build robust and meaningful multimodal representations in recommendation tasks.
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named Crab. We present that post-training on Crab improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models are released to facilitate future research.
Personalized Federated Recommendation with Multi-Faceted User Representation and Global Consistent Prototype
Personalized recommender systems are critical for enhancing user engagement across a range of digital platforms. However, conventional approaches rely heavily on centralized data collection, raising significant privacy concerns. Federated recommender systems (PFRS) address these concerns by decentralizing model training, ensuring user data privacy. Despite the progress, existing methods still struggle with capturing the multi-faceted nature of user and transferring global knowledge effectively. In this work, we propose FedMUR, a novel federated recommendation framework that models user representation as a Gaussian mixture distribution, capturing users' multi-faceted characteristics. Each Gaussian component corresponds to a distinct interest facet, with adaptive mixture weights representing the user's preference intensity toward each facet. To facilitate knowledge transfer, FedMUR constructs global consistent prototypes that encode shared behavioral trends across users via popularity-weighted optimal transport. These prototypes enhance local models by injecting global shared patterns into personalized representation learning. Extensive experiments across several real-world datasets demonstrate that FedMUR significantly outperforms existing state-of-the-art federated recommendation systems.
TCFMamba: Trajectory Collaborative Filtering Mamba for Debiased Point-of-Interest Recommendation
Next Point-of-Interest (POI) recommendation, which predicts users' future destinations based on their potential interests, has emerged as a critical task in location-based social networks (LBSNs). However, this task remains challenged by issues such as popularity bias, exposure bias, and limited representational capacity, all of which impede the accurate modeling of users and POIs, thereby restricting balanced and effective recommendations. Therefore, we propose Trajectory Collaborative Filtering Mamba (TCFMamba), which integrates two specially designed modules, i.e., Joint Learning of Static and Dynamic Representations (JLSDR) and Preference State Mamba Network (PSMN), for debiased Point-of-Interest recommendation.
Invariant Treatment Effect Estimation via Consistent Constraints and Information Bottleneck
Considerable research has focused on the challenge of estimating individual treatment effects (ITE) from observational data, primarily due to the presence of treatment assignment bias. To address this, practitioners often adjust for relevant covariates to correct potential biases. However, indiscriminately adjusting for all observed covariates risks including 'bad controls'-variables that can introduce bias when conditioned upon, thereby compromising ITE estimation accuracy. To tackle this issue, we propose Invariant Treatment Effect Estimation via Consistent Constraints and Information Bottleneck (CIBITE). This method mitigates the impact of bad controls by leveraging diverse environments and adjusting for confounding factors in observational data, enabling robust ITE estimation. We introduce a novel invariant causal prediction framework to eliminate bad controls while retaining sufficient information for confounding adjustment. This is achieved by imposing consistent constraints on both the representation and output layers of the neural network. Additionally, the Information Bottleneck is employed to reduce the influence of pseudo-invariant features. To further address confounding, we propose a balanced representation learning framework using adversarial training. Extensive experiments on synthetic, semi-simulated, and real-world datasets demonstrate the effectiveness of our approach. The proposed method significantly outperforms state-of-the-art ITE estimation techniques and existing Invariant Risk Minimization (IRM)-based methods.
OBDD-NET: End-to-End Learning of Ordered Binary Decision Diagrams
Learning Ordered Binary Decision Diagrams (OBDDs) from large-scale datasets is an important topic of explainable artificial intelligence. However, existing search-based methods are still limited in scalability regarding dataset size, since they must explicitly encode the satisfaction of all examples in a dataset. To tackle this challenge, we introduce an OBDD encoding method to parameterize a neural network. This method frees satisfaction encoding of all examples in a dataset while leveraging mini-batch training techniques to enhance learning efficiency. Our main theoretical contribution is to prove that our approach enables the simulation of OBDD inference within a continuous space. Besides, we identify faithful OBDD encoding to fulfill the properties required by OBDDs, allowing to interpret an OBDD directly from the learned parameter assignment. With faithful OBDD encoding, we present an end-to-end neural model named ØBDDNet, being capable of coping with large-scale datasets. Experimental results exhibit better scalability and competitive prediction performance of ØBDDNet compared to state-of-the-art OBDD learners. Valuable insights about faithful OBDD encoding are derived from the ablation study. The implementation is available at: https://github.com/jmq-design/OBDD-NET.
UniROM: Unifying Online Advertising Ranking as One Model
The Multi-stage Cascading Architecture (MCA), widely adopted in industrial advertising systems to balance efficiency and effectiveness, suffers from critical limitations: 1) ranking inconsistency caused by conflicting modeling objectives and capacity gaps across stages, and 2) the inability to model externalities-mutual influences among candidate ads in ranking stages. These issues degrade system performance and lead to suboptimal platform revenue. In this paper, we present UniROM, an end-to-end generative architecture that Unifies online advertising Ranking as One Model. UniROM replaces cascaded stages with a single model to directly generate optimal ad sequences from the full candidate ad corpus in location-based services (LBS). The primary challenges associated with this approach stem from high costs of feature processing and computational bottlenecks in modeling externalities of large-scale candidate pools. To address these challenges, UniROM introduces an algorithm and engine co-designed hybrid feature service to decouple user and ad feature processing, reducing latency while preserving expressiveness. To efficiently extract intra- and cross-sequence mutual information, we propose RecFormer with an innovative cluster-attention mechanism as its core architectural component. Furthermore, we propose a bi-stage training strategy that integrates pre-training with reinforcement learning-based post-training to meet sophisticated platform and advertising objectives. Extensive offline evaluations on public benchmarks and large-scale online A/B testing on industrial advertising platform have demonstrated the superior performance of UniROM over state-of-the-art MCAs.
GRLND: A Graph Reinforcement Learning Framework for Network Dismantling
Network Dismantling (ND) seeks to identify the smallest subset of nodes whose removal fragments a network into disconnected components. Traditional methods rely on fixed centrality heuristics or supervised models trained on synthetic data, often failing to generalize across diverse topologies. We introduce GRLND, a Graph Reinforcement Learning framework that enables fully unsupervised, structure-aware dismantling through end-to-end optimization. GRLND formulates ND as a single-step Markov Decision Process (MDP), where the action is a binary mask indicating the nodes to be removed-allowing the agent to generate a complete dismantling strategy in a single forward pass while accounting for the joint effect of multiple node removals. The framework combines a Graph Convolutional Network (GCN) for topological encoding with a stochastic policy trained via the REINFORCE algorithm. Additionally, we design a task-specific reward that balances connectivity disruption and removal sparsity, guiding the policy toward compact yet high-impact dismantling solutions. Experiments on both synthetic and real-world networks show that GRLND consistently outperforms classical heuristics and recent learning-based methods, achieving strong generalization without requiring labels or pretraining.
Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts
Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient feedback. A common solution is to enrich item representations using multimodal encoders (e.g., BERT or ViT) to extract visual and textual features. However, these encoders are pretrained on general-purpose tasks: they are not tailored to user preference modeling, and they overlook the fact that user tastes toward modality-specific features such as visual styles and textual tones can also drift over time. This presents two key challenges in streaming scenarios: the high cost of fine-tuning large multimodal encoders, and the risk of forgetting long-term user preferences due to continuous model updates. To tackle these challenges, we propose Expandable Side Mixture-of-Experts (XSMoE), a memory-efficient framework for multimodal streaming recommendation. XSMoE attaches lightweight side-tuning modules consisting of expandable expert networks to frozen pretrained encoders and incrementally expands them in response to evolving user feedback. A gating router dynamically combines expert and backbone outputs, while a utilization-based pruning strategy maintains model compactness. By learning new patterns through expandable experts without overwriting previously acquired knowledge, XSMoE effectively captures both cold start and shifting preferences in multimodal features. Experiments on three real-world datasets demonstrate that XSMoE outperforms state-of-the-art baselines in both recommendation quality and computational efficiency.
Causality-aware Graph Aggregation Weight Estimator for Popularity Debiasing in Top-K Recommendation
Graph-based recommender systems leverage neighborhood aggregation to generate node representations, which is highly sensitive to popularity bias, resulting in an echo effect during information propagation. Existing graph-based debiasing solutions refine the aggregation process with attempts such as edge reconstruction or weight adjustment. However, these methods remain inadequate in fully alleviating popularity bias. Specifically, this is because 1) they provide no insights into graph aggregation rationality, thus lacking an optimality guarantee; 2) they fail to well balance the training and debiasing process, which undermines the effectiveness. In this paper, we propose a novel approach to mitigate popularity bias through rational modeling of the graph aggregation process. We reveal that graph aggregation is a special form of backdoor adjustment in causal inference, where the aggregation weight corresponds to the historical interaction likelihood distribution. Based on this insight, we devise an encoder-decoder architecture, namely Causality-aware Graph Aggregation Weight Estimator for Debiasing (CAGED), to approximate the unbiased aggregation weight by optimizing the evidence lower bound of the interaction likelihood. In order to enhance the debiasing effectiveness during early training stages, we further design a momentum update strategy that incrementally refines the aggregation weight matrix. Extensive experiments on three datasets demonstrate that CAGED outperforms existing graph-based debiasing methods. Our implementation is available at https://github.com/QueYork/CAGED.
ITL-LIME: Instance-Based Transfer Learning for Enhancing Local Explanations in Low-Resource Data Settings
Explainable Artificial Intelligence (XAI) methods, such as Local Interpretable Model-Agnostic Explanations (LIME), have advanced the interpretability of black-box machine learning models by approximating their behavior locally using interpretable surrogate models. However, LIME's inherent randomness in perturbation and sampling can lead to locality and instability issues, especially in scenarios with limited training data. In such cases, data scarcity can result in the generation of unrealistic variations and samples that deviate from the true data manifold. Consequently, the surrogate model may fail to accurately approximate the complex decision boundary of the original model. To address these challenges, we propose a novel Instance-based Transfer Learning LIME framework (ITL-LIME) that enhances explanation fidelity and stability in data-constrained environments. ITL-LIME introduces instance transfer learning into the LIME framework by leveraging relevant real instances from a related source domain to aid the explanation process in the target domain. Specifically, we employ clustering to partition the source domain into clusters with representative prototypes. Instead of generating random perturbations, our method retrieves pertinent real source instances from the source cluster whose prototype is most similar to the target instance. These are then combined with the target instance's neighboring real instances. To define a compact locality, we further construct a contrastive learning-based encoder as a weighting mechanism to assign weights to the instances from the combined set based on their proximity to the target instance. Finally, these weighted source and target instances are used to train the surrogate model for explanation purposes. Experimental evaluation with real-world datasets demonstrates that ITL-LIME greatly improves the stability and fidelity of LIME explanations in scenarios with limited data. Our code is available at https://github.com/rehanrazaa/ITL-LIME.
Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation
Multimodal recommendation systems are increasingly becoming foundational technologies for e-commerce and content platforms, enabling personalized services by jointly modeling users' historical behaviors and the multimodal features of items (e.g., visual and textual). However, most existing methods rely on either static fusion strategies or graph-based local interaction modeling, facing two critical limitations: (1) insufficient ability to model fine-grained cross-modal associations, leading to suboptimal fusion quality; and (2) a lack of global distribution-level consistency, causing representational bias. To address these, we propose MambaRec, a novel framework that integrates local feature alignment and global distribution regularization via attention-guided learning. At its core, we introduce the Dilated Refinement Attention Module (DREAM), which uses multi-scale dilated convolutions with channel-wise and spatial attention to align fine-grained semantic patterns between visual and textual modalities. This module captures hierarchical relationships and context-aware associations, improving cross-modal semantic modeling. Additionally, we apply Maximum Mean Discrepancy (MMD) and contrastive loss functions to constrain global modality alignment, enhancing semantic consistency. This dual regularization reduces mode-specific deviations and boosts robustness. To improve scalability, MambaRec employs a dimensionality reduction strategy to lower the computational cost of high-dimensional multimodal features. Extensive experiments on real-world e-commerce datasets show that MambaRec outperforms existing methods in fusion quality, generalization, and efficiency. Our code has been made publicly available at https://github.com/rkl71/MambaRec.
Fine-Grained Emotion Recognition via In-Context Learning
Fine-grained emotion recognition aims to identify the emotional type in queries through reasoning and decision-making processes, playing a crucial role in various systems. Recent methods use In-Context Learning (ICL), enhancing the representation of queries in the reasoning process through semantically similar examples, while further improving emotion recognition by explaining the reasoning mechanisms. However, these methods enhance the reasoning process but overlook the decision-making process. This paper investigates decision-making in fine-grained emotion recognition through prototype theory. We show that ICL relies on similarity matching between query representations and emotional prototypes within the model, where emotion-accurate representations are critical. However, semantically similar examples often introduce emotional discrepancies, hindering accurate representations and causing errors. To address this, we propose Emotion In-Context Learning (EICL), which introduces emotionally similar examples and uses a dynamic soft-label strategy to improve query representations in the emotion reasoning process. A two-stage exclusion strategy is then employed to assess similarity from multiple angles, further optimizing the decision-making process. Extensive experiments show that EICL significantly outperforms ICL on multiple datasets.
Is This News Still Interesting to You?: Lifetime-aware Interest Matching for News Recommendation
Personalized news recommendation aims to deliver news articles aligned with users' interests, serving as a key solution to alleviate the problem of information overload on online news platforms. While prior work has improved interest matching through refined representations of news and users, the following time-related challenges remain underexplored: (C1) leveraging the age of clicked news to infer users' interest persistence, and (C2) modeling the varying lifetime of news across topics and users. To jointly address these challenges, we propose a novel Lifetime-aware Interest Matching framework for nEws recommendation, named LIME, which incorporates three key strategies: (1) User-Topic lifetime-aware age representation to capture the relative age of news with respect to a user-topic pair, (2) Candidate-aware lifetime attention for generating temporally aligned user representation, and (3) Freshness-guided interest refinement for prioritizing valid candidate news at prediction time. Extensive experiments on two real-world datasets demonstrate that LIME consistently outperforms a wide range of state-of-the-art news recommendation methods, and its model-agnostic strategies significantly improve recommendation accuracy.
Empirical Study of Over-Squashing in GNNs and Causal Estimation of Rewiring Strategies
Graph neural networks (GNNs) have exhibited state-of-the-art performance across a wide range of domains. Yet message-passing GNNs suffer from over-squashing---exponential compression of long-range information from distant nodes---which limits expressivity. Rewiring techniques can ease this bottleneck, but their practical impacts are unclear due to the lack of a direct empirical over-squashing metric. We propose a topology-focused method for assessing over-squashing between node pairs using the decay rate of their mutual sensitivity. We then extend these pairwise assessments to graph-level statistics. Coupling these metrics with a within-graph causal design, we quantify how rewiring strategies affect over-squashing on diverse graph- and node-classification benchmarks. Our extensive empirical analyses show that most graph classification datasets suffer from over-squashing (but to various extents), and rewiring effectively mitigates it---though the degree of mitigation, and its translation into performance gains, varies by dataset and method. We also found that over-squashing is less notable in node classification datasets, where rewiring often increases over-squashing, and performance variations are uncorrelated with over-squashing changes. These findings suggest that rewiring is most beneficial when over-squashing is both substantial and corrected with restraint---while overly aggressive rewiring, or rewiring applied to minimally over-squashed graphs, is unlikely to help and may even harm performance. Our plug-and-play diagnostic tool lets practitioners decide whether rewiring is likely to pay off.
On Verifiable Legal Reasoning: A Multi-Agent Framework with Formalized Knowledge Representations
Legal reasoning requires both precise interpretation of statutory language and consistent application of complex rules, presenting significant challenges for AI systems. This paper introduces a modular multi-agent framework that decomposes legal reasoning into distinct knowledge acquisition and application stages. In the first stage, specialized agents extract legal concepts and formalize rules to create verifiable intermediate representations of statutes. The second stage applies this knowledge to specific cases through three steps: analyzing queries to map case facts onto the ontology schema, performing symbolic inference to derive logically entailed conclusions, and generating final answers using a programmatic implementation that operationalizes the ontological knowledge. This bridging of natural language understanding with symbolic reasoning provides explicit and verifiable inspection points, significantly enhancing transparency compared to end-to-end approaches. Evaluation on statutory tax calculation tasks demonstrates substantial improvements, with foundational models achieving 76.4% accuracy compared to 18.8% baseline performance, effectively narrowing the performance gap between reasoning and foundational models. These findings suggest that modular architectures with formalized knowledge representations can make sophisticated legal reasoning more accessible through computationally efficient models while enhancing consistency and explainability in AI legal reasoning, establishing a foundation for future research into more transparent, trustworthy, and effective AI systems for legal domain.
STGS: Spatio-temporal Graph Sparsification Using Reinforcement Learning
Spatio-temporal graphs encode dynamic interactions across space and time, but their size and complexity pose challenges for analysis and computation. Graph sparsification provides an effective solution to these issues by reducing the number of edges while preserving the essential structural and dynamic properties of the network. This reduction is crucial for enhancing the interpretability of complex graphs, revealing hidden patterns, and enabling more efficient computational analysis. However, real-world graphs often exhibit continuous spatial and temporal evolution, which most existing sparsification algorithms, primarily designed for static graphs, fail to address. We introduce STGS (Spatio-Temporal Graph Sparsification), a reinforcement learning-based framework for sparsifying spatio-temporal graphs. By learning to prune edges while preserving key spatio-temporal patterns, STGS enables efficient analysis of evolving systems. Experiments on real-world datasets demonstrate that STGS outperforms existing methods in both structural preservation and downstream forecasting tasks.
General Adaptive Memory Allocation for Learned Bloom Filters
Membership testing, which determines whether an element belongs to a set, is widely used in fields like database systems and network applications. Bloom Filters (BFs) can solve this problem efficiently but suffer from high False Positive Rates (FPRs) and large memory requirements for massive datasets. Learned Bloom Filters (LBFs), combining a learning model with a backup Bloom Filter, mitigate these issues by capturing data distributions. However, the critical problem of memory allocation between the learning model and the backup filter has usually been overlooked, despite its significant impact on LBF performance under constrained budgets. To this end, we propose Gama, the first General Adaptive Memory Allocation framework for LBFs as far as we know. Gama introduces two memory allocation strategies: Loop-Based method and Bayesian-Based method. Loop-Based method evaluates all configurations at each training epoch, making it well-suited for scenarios with tight memory constraints. However, it faces efficiency challenges under large memory budgets due to the requirement for exhaustive evaluations. In contrast, Bayesian-Based method efficiently navigates the search space through probabilistic exploration, which reduces the number of configurations evaluated and significantly improves the efficiency while maintaining FPRs. Furthermore, we propose a hybrid approach that combines their strengths to dynamically adapt to different constraints. Experiments on three real-world datasets show that Gama can achieve a relative performance improvement of 69% in terms of FPR in the best case.
Retrieval-Augmented Image Captioning via Synthesized Entity-Aware Knowledge Representations
Retrieval-Augmented Image Captioning enhances the model's understanding of real-world images by retrieving external knowledge. Existing methods mainly use original captions or isolated entities related to the query image to help generate captions. However, these methods make the model either imitate the caption style or fail to capture the relationship between entities, resulting in a lack of diversity or inaccuracy in the generated captions. To address these issues, we propose SEAR, a novel framework that utilizes external Synthesized Entity-Aware knowledge Representations to improve captioning performance. Specifically, SEAR clusters images based on scene-level and entity-level features, and synthesizes each clustered images into representative images as retrieval indexes, and simultaneously utilizes a large model to extract and supplement structured knowledge graphs from the corresponding cluster captions. Furthermore, we design a knowledge-graph pruner to prune the knowledge graph by retaining the most relevant subgraphs to the query image. By undertaking these steps in an integrated manner, SEAR enables the model to acquire non-redundant and structured information for generating captions and avoid data-related privacy issues. Extensive experiments on MSCOCO, Flickr30k, and NoCaps demonstrate the effectiveness of our method both in-domain and out-of-domain, outperforming existing lightweight RAIC methods and remaining competitive with heavyweight models.
EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling
Personality detection from text is commonly performed by analysing users' social media posts. However, existing methods heavily rely on large-scale annotated datasets, making it challenging to obtain high-quality personality labels. Moreover, most studies treat emotion and personality as independent variables, overlooking their interactions. In this paper, we propose a novel self-supervised framework, EmoPerso, which improves personality detection through emotion-aware modelling. EmoPerso first leverages generative mechanisms for synthetic data augmentation and rich representation learning. It then extracts pseudo-labeled emotion features and jointly optimizes them with personality prediction via multi-task learning. A cross-attention module is employed to capture fine-grained interactions between personality traits and the inferred emotional representations. To further refine relational reasoning, EmoPerso adopts a self-taught strategy to enhance the model's reasoning capabilities iteratively. Extensive experiments on two benchmark datasets demonstrate that EmoPerso surpasses state-of-the-art models. The source code is available at https://github.com/slz0925/EmoPerso.
ROKAN: Toward Interpretable and Domain-Robust Memory Behavior Modeling
Memory behavior modeling aims to predict individual performance over time and uncover underlying cognitive mechanisms. However, existing approaches often struggle to balance predictive accuracy, domain generalization, and model interpretability. To address this, we propose ROKAN, a cognitively inspired and symbolically interpretable memory modeling framework. Based on the Multiscale Context Model, ROKAN formalizes the evolution of memory traces as a differentiable Ordinary Differential Equation system, implemented via Kolmogorov-Arnold Networks to derive human-readable symbolic expressions. To enhance generalization across heterogeneous learning domains, we design an Adaptive Domain-Aware loss function, which integrates Empirical Risk Minimization with Distributionally Robust Optimization through dynamic domain-aware weighting. Our experiments demonstrate that ROKAN significantly outperforms existing mainstream methods in both predictive accuracy and domain generalization. The symbolic expressions were found to exhibit formal consistency with classical memory theories, which lends support to the model's theoretical assumptions and empirical performance, and provides a new pathway toward theoretically grounded white-box memory modeling. Our code is available at https://github.com/hellowads/ROKAN.
Towards Few-shot Chemical Reaction Outcome Prediction
Accurate chemical reaction prediction is essential for drug discovery and synthetic planning. However, this task becomes particularly challenging in low-data scenarios, where novel reaction types lack sufficient training examples. To address this challenge, we propose FewRxn, a novel model-agnostic few-shot reaction prediction framework that enables rapid adaptation to unseen reaction types using only a few training samples. FewRxn integrates several key innovations, including segmentation masks for enhanced reactant representation, fingerprint embeddings for richer molecular context, and task-aware meta-learning for effective knowledge transfer. Through extensive evaluations, FewRxn achieves state-of-the-art accuracy in few-shot settings, significantly outperforming traditional fine-tuning methods. Additionally, our work provides insights into the impact of molecular representations on reaction knowledge transfer, demonstrating that knowledge captured under molecular graph-based formulation consistently outperforms those learned in forms of SMILES generation in few-shot learning.
Local Structure-Adaptive Graph Filtering for Collaborative Filtering
The structural heterogeneity of user-item interaction graphs poses a fundamental challenge for graph-based recommender systems. While Graph Convolutional Networks (GCNs) have achieved remarkable success in collaborative filtering, their uniform low-pass filtering nature often fails to accommodate the varying spectral needs of nodes with different local structures, resulting in suboptimal performance. To address this issue, we propose a Structurally Sensitive Adaptive Graph Filter, dubbed as SSAGF, a novel framework that enables structure-aware filtering on user-item graphs. SSAGF first clusters nodes based on local structural properties, then learns customized filters per group by reinterpreting the convolutional depth in GCNs. This adaptive mechanism ensures that nodes with distinct structural roles are treated appropriately, enhancing both accuracy and fairness without significantly increasing model complexity. To further improve scalability, SSAGF avoids costly eigenvalue decompositions by approximating spectral filters through Maclaurin series expansion, transforming the convolution into a pooling-like operation over standard GCN outputs. Extensive experiments on four benchmark datasets demonstrate that SSAGF consistently outperforms competitive baselines, especially in scenarios with high structural heterogeneity, offering a principled and efficient solution for structure-aware recommendation.
Learning Invariant Reliability under Diverse Contexts for Robust Multimedia Recommendation
In graph-based multimedia recommendation, accurately modeling item-item semantic similarity is crucial for constructing high-quality semantic structures. However, multimodal content often exhibits semantic inconsistencies across modalities, resulting in noisy or misleading similarity signals. We refer to this as modality mismatching, where unaligned representations, such as an image conveying semantics unrelated to its accompanying text, undermine the reliability of feature-based similarity estimation. Importantly, modality consistency is context sensitive, varying with the underlying semantic environment in which modalities are interpreted. This highlights the necessity of jointly modeling modality reliability and contextual semantics. To address this challenge, we propose RGSLMRec, a robust graph structure learning framework that models and exploits the semantic reliability of multimodal features under diverse contexts. At its core, RGSLMRec builds on the invariant learning paradigm and introduces two key innovations: (i) it simulates multiple perturbed semantic environments and employs environment-specific monotonic networks to estimate reliability; and (ii) it adopts a risk-invariant objective based on Variance Risk Extrapolation to enforce the learning of invariant reliability across environments. On top of this, (iii) RGSLMRec constructs reliability-guided item-item graphs and captures collaborative and semantic signals via a hybrid early-late fusion strategy. Extensive experiments on several real-world datasets and additional synthetically perturbed datasets demonstrate that RGSLMRec not only outperforms strong baselines but also exhibits superior robustness to modality mismatching.
Continuous Data Augmentation via Condition-Tokenized Diffusion Transformer for Sequential Recommendation
Data augmentation plays a crucial role in enhancing sequential recommendation (SR) by providing richer training signals. Recently, diffusion models (DMs) have been introduced into SR to generate realistic interaction sequences. However, existing DM-based methods face three key limitations: (1) semantic deviation. The rounding procedure, which maps continuous embeddings to discrete item sequences, may introduce semantic deviation; (2) preference misalignment. The explicit preference guidance is neglected during generation, resulting in synthetic sequences that misalign with users' actual interests; (3) suboptimal training strategy. The two-stage methods, which train the SR model and the DM separately, overlook their potential complementarity. To address these challenges, we propose Continuous Data Augmentation via Condition-Tokenized Diffusion Transformer for Sequential Recommendation (CATDiT). Specifically, CATDiT discards the rounding operation and leverages continuous embeddings as augmented data to preserve semantic integrity. Then, we guide the generation process with user intent via a condition-tokenized Diffusion Transformer, aligning synthetic sequences with users' real preferences. Finally, we propose an alternating optimization strategy to enable mutual learning between the SR model and the DM. Extensive experiments on five real-world datasets demonstrate that CATDiT consistently outperforms state-of-the-art baselines, validating its effectiveness in generating high-quality sequences and improving SR performance.
Incremental Learning for LLM-based Tokenization and Recommendation
Large Language Models for Recommendation (LLM4Rec) have shown great potential. Many LLM4Rec approaches technically leverage a learnable tokenizer to assign item identifiers and then enable a Recommender LLM (RecLLM) to process tokenized items and user interactions for recommendation. However, a key challenge in their real-world deployment is the need for continuous retraining over time to accommodate new items and evolving user interests. While existing retraining methods can be applied to RecLLMs, learnable tokenizers introduce additional retraining challenges. We conduct a comprehensive investigation into the joint retraining of RecLLMs and learnable tokenizers, identifying key issues such as identifier collision and identifier shifts across periods. To address these, we propose Reformer, an incremental learning framework to fine-tune RecLLMs and learnable tokenizers at each period. Reformer employs a dynamic codebook to mitigate identifier collision by appending new codes and enforcing a diversity-oriented code assignment constraint. Additionally, Reformer adopts an identifier freezing strategy to ensure the invariance of previously assigned item identifiers across retraining periods. We instantiate Reformer on two representative RecLLMs and conduct extensive experiments on three real-world datasets. Substantial results demonstrate its superior retraining performance, facilitating the real-world deployment of LLM4Rec.
SimFormer: Multilevel Transformer on Learnable Mesh Graphs for Engineering Simulation
Numerical simulation is important in real-world engineering systems, such as solid mechanics and aero-dynamics. Hierarchical GNNs can learn engineering simulation with low simulation time and acceptable accuracy, but fail to represent complex interactions in simulation systems. In this paper, we propose a novel multilevel Transformer on learnable clusters, namely SimFormer. The key novelty of SimFormer is to interweave the learning of a learnable soft-cluster assignment algorithm and the inter-cluster/cluster-to-node attention. In form of a closed-loop, SimFormer learns the soft cluster assignment possibility by the feedback signals provided by the attention, and the attention can leverage the learnable clusters to better represent long-range interactions. In this way, the learnable clusters can adaptively match actual simulation results, and the multilevel attention modules can also effectively represent node embeddings. Experiments on four datasets demonstrate the superiority of SimFormer over seven baseline approaches. For example, on the real dataset, ours outperforms the recent work Eagle by 17.36% lower RMSE and 27.03% smaller FLOPs. The code and datasets are available at: https://github.com/pro-orp/SimFormer.
Community Partition-based Source Localization with Adaptive Observers Deployment
In the contemporary era, characterized by an accelerated development in the domain of social networks, the phenomenon of fake news has attained unprecedented levels of prevalence, exerting substantial detrimental influence on society. Identification of the sources of such information in a timely manner is of paramount importance in order to prevent further damage. Existing source localization methods can be categorized into two distinct approaches: the first involves the deployment of observers followed by localization, while the second employs traditional community partitioning for source localization without considering community structure in observer deployment, resulting in suboptimal information acquisition. To address this issue, we propose Community Partition-Based Source Localization with Adaptive Observers Deployment (CSOL), which consists of three stages: In the first stage, community partitioning is achieved using contrastive learning with optimization and a feature extraction module that is highly correlated with partition. In the second stage, we are the first work to adaptively deploy observer based on community importance, integrating community partitioning with observer placement. In the third stage, an early source estimation strategy is employed to enhance efficiency and accuracy. Experimental results in real-world networks demonstrate that CSOL outperforms other SOTA methods in both accuracy and efficiency.
MSOFormer: Multi-scale Transformer with Orthogonal Embedding and Frequency Modeling for Multivariate Time Series Forecasting
Multivariate Time Series Forecasting (MTSF) plays a critical role in diverse practical applications. Although Transformer-based models have recently achieved impressive results in this field, their performance is still hindered by three core challenges: complex temporal dependencies, diverse inter-variable correlations, and patterns that span multiple time scales. To address these issues, we propose MSOFormer-a Multi-scale Transformer with Orthogonal Embedding and Frequency Modeling. Specifically, the Dynamic Frequency Filter adaptively weights frequency components across variables based on input characteristics, enabling full-spectrum modeling and precise extraction of key frequency patterns. To improve inter-variable representation, we introduce Orthogonal Embedding, a novel projection strategy for queries and keys that enhances feature diversity in channel-wise self-attention. In addition, Multi-scale Patch Embedding captures temporal features across different scales, providing a comprehensive time series representation. To evaluate MTSF in cloud-native environments, we construct the first three Cloud Kafka cluster datasets, specifically curated for elastic message queue scaling scenarios. Extensive experiments across eleven real-world benchmark datasets demonstrate that MSOFormer consistently outperforms existing state-of-the-art methods, highlighting its effectiveness and broad applicability.
Benefit from Rich: Tackling Search Interaction Sparsity in Search Enhanced Recommendation
In modern online platforms, search and recommendation (S&R) often coexist, offering opportunities for performance improvement through search-enhanced approaches. Existing studies show that incorporating search signals boosts recommendation performance. However, the effectiveness of these methods relies heavily on rich search interactions. They primarily benefit a small subset of users with abundant search behavior, while offering limited improvements for the majority of users who exhibit only sparse search activity. To address the problem of sparse search data in search-enhanced recommendation, we face two key challenges : (1) how to learn useful search features for users with sparse search interactions, and (2) how to design effective training objectives under sparse conditions. Our idea is to leverage the features of users with rich search interactions to enhance those of users with sparse search interactions. Based on this idea, we propose GSERec, a method that utilizes message passing on the User-Code Graphs to alleviate data sparsity in Search-Enhanced Recommendation. Specifically, we utilize Large Language Models (LLMs) with vector quantization to generate discrete codes, which connect similar users and thereby construct the graph. Through message passing on this graph, embeddings of users with rich search data are propagated to enhance the embeddings of users with sparse interactions. To further ensure that the message passing captures meaningful information from truly similar users, we introduce a contrastive loss to better model user similarities. The enhanced user representations are then integrated into downstream search-enhanced recommendation models. Experiments on three real-world datasets show that GSERec consistently outperforms baselines, especially for users with sparse search behaviors.
Docking-Aware Attention: Dynamic Protein Representations through Molecular Context Integration
Computational prediction of enzymatic reactions represents a crucial challenge in sustainable chemical synthesis across various scientific domains, ranging from drug discovery to materials science and green chemistry. These syntheses rely on highly adaptable protein catalysts that perform different molecular transformations depending on their molecular partners. Current approaches to protein representation in reaction prediction either ignore protein structure entirely or rely on static embeddings, failing to capture how proteins dynamically adapt their behavior to different substrates. We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic, context-dependent protein representations by incorporating molecular docking information into the attention mechanism. DAA combines physical interaction scores from docking predictions with learned attention patterns to focus on protein regions most relevant to specific molecular interactions. We evaluate our method on enzymatic reaction prediction, where it outperforms previous state-of-the-art methods, demonstrating a 9.5% relative improvement on complex molecules and a 12.3% relative improvement on innovative reactions. Furthermore, we demonstrate the generalization capabilities of our learned representations of a Drug-Target Interaction. We demonstrate how DAA generates interpretable attention patterns that adapt to different molecular contexts through detailed ablation studies and visualizations. Our approach represents a general framework for context-aware protein representation, with potential applications across enzymatic synthesis planning and other protein-molecule interaction tasks. We open-source our implementation and pre-trained models to facilitate further research.
InstANNS: Scalable Approximate Nearest Neighbor Search via Cost-Efficient In-Storage Processing
Billion-scale approximate nearest neighbor search (ANNS) increasingly relies on disk-based indexes due to the rapid growth of modern datasets. Existing disk-augmented indexing systems, such as SPANN, often face performance bottlenecks due to limited host interface bandwidth, typically constrained by PCIe. To address this bottleneck, we introduce InstANNS, a storage-centric ANNS architecture that improves throughput and reduces data transfer by performing query-aware PQ filtering inside SSDs, without relying on GPUs. By offloading distance computations to the SSD controller and utilizing abundant internal bandwidth, our design transfers only highly relevant candidates to the host, significantly reducing PCIe traffic. To further optimize performance, we propose co-occurrence-aware PQ code placement, which co-locates frequently co-accessed candidates, and conditional PQ bypass, which reduces NAND reads by skipping low-utility filtering. We prototype InstANNS by extending SPANN and evaluate it using a device-level SSD simulator, meticulously calibrated against measurements from an actual SSD controller SoC (System-on-Chip) to ensure accurate performance evaluation. Experimental results show that InstANNS improves QPS by 2.15× over SPANN and QPS-per-dollar by 1.7× over FusionANNS at 90% Recall@10, while maintaining accuracy.
Parse-LLM: A Prior-Free LLM Parser for Unknown System Logs
Log parsing extracts structured information from unstructured logs and serves as a fundamental pre-processing step for various log-based analytics and monitoring tasks. Recent advances have leveraged Large Language Models (LLMs) to handle log format complexities and enhance parsing performance. However, these methods heavily rely on labeled data, which is often scarce in rapidly evolving industrial systems, limiting their applicability in real-world scenarios. Moreover, the sheer volume of logs results in slow parsing and high computational costs, further hindering the deployment of LLM-based log parsing systems. To address these issues, we propose Parse-LLM, an unsupervised end-to-end log parsing framework based on LLMs Specifically, we first developed a Log Decomposer Agent that leverages Chain-of-Thought (CoT) reasoning and callable tools, enabling the LLM to autonomously separate log headers from content. Next, we introduce the Hybrid Log Partition module, which segments logs by balancing commonalities and differences. Finally, we developed a novel Variation-aware Log Parsing module that allows the LLM to harness additional supervisory signals through comparative analysis of similar logs. Comprehensive experiments conducted on large-scale public datasets show that Parse-LLM outperforms state-of-the-art log parsers in an unsupervised setting, offering an effective and scalable solution for the practical application of unsupervised log parsing.
ProxySampler: Proxy Informativeness Estimation for Efficient Data Selection in Active Learning
Large-scale data analysis services require efficient periodic model updates to adapt to the possibly changing data distributions. Manually labeling all available samples for task model updates is infeasible for a large sample scale. Active learning technique is proposed to iteratively select subsets of the most informative samples for labeling. From our experience of applying active learning in a real-world video analysis system, we identify a previously overlooked bottleneck of time cost: data selection. Existing active learning methods select data by estimating informativeness (e.g., output confidence) over all unlabeled samples in each iteration. This data selection process can take up to 42% of the time cost of end-to-end model updates in our system (totals include the time for manual labeling, data selection, and model updates.). To address the time cost bottleneck caused by data selection, we propose a new idea: proxy informativeness estimation. We start with modeling the time cost of data selection, from which we identify three key factors: unit estimation cost, the number of samples for estimation, and the number of iteration rounds. The influence of the first two factors increases cumulatively with the number of iteration rounds. Correspondingly, we design a proxy estimator and a sample pooling method, respectively. Our proxy estimator is a lightweight neural network for direct informativeness estimation to replace the role of the high-cost task model, thus reducing the unit cost. And, our sample pooling method leverages historical estimation results to narrow the scope of sample candidates. Based on the above design, we develop ProxySampler, which can be integrated with various active learning approaches as a plug-in. Experimental results show that integrating ProxySampler with state-of-the-art active learning methods can reduce the time cost by 53.6-83.3% (a 2.15-6.01x speedup) when achieving the same accuracy.
Quantized Factor Identifiable Causal Effect Variational Autoencoder
Causal inference involves determining how interventions affect outcomes and explaining the underlying mechanisms, and it holds critical importance across various fields. A key assumption in causal inference is that the measured covariates form a sufficient adjustment set. However, this assumption often fails due to unobserved confounders, as confounding mechanisms are rarely fully captured by measured covariates alone. Recent research has attempted to address this challenge using variational autoencoders (VAEs), but these approaches face practical limitations, including unidentifiability and bias toward proxy variables. To overcome these issues, we propose a novel method that incorporates quantized factor identifiability into VAEs for causal effect estimation. This integration mitigates unidentifiability and reduces the dominance of proxy variables, thereby enhancing consistency and accuracy in causal inference. Extensive experiments on both simulated and real-world datasets demonstrate the robustness and effectiveness of our method, establishing a new benchmark in deep causal modeling.
Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
Molecular representation learning plays a crucial role in advancing applications such as drug discovery and material design. Existing work leverages 2D and 3D modalities of molecular information for pre-training, aiming to capture comprehensive structural and geometric insights. However, these methods require paired 2D and 3D molecular data to train the model effectively and prevent it from collapsing into a single modality, posing limitations in scenarios where a certain modality is unavailable or computationally expensive to generate. To overcome this limitation, we propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input. Specifically, inspired by the unified structure in vision-language models, our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality. This enables a multistage continuous learning process where both modalities contribute collaboratively during training, while ensuring robustness when only one modality is available during inference. Extensive experiments demonstrate that FlexMol achieves superior performance across a wide range of molecular property prediction tasks, and we also empirically demonstrate its effectiveness with incomplete data. Our code and data are available at https://github.com/tewiSong/FlexMol.
GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning
Self-supervised learning (SSL) has shown great promise in graph representation learning. However, most existing graph SSL methods are developed and evaluated under a single-dataset setting, leaving their cross-dataset transferability largely unexplored and limiting their ability to leverage knowledge transfer and large-scale pretraining, factors that are critical for developing generalized intelligence beyond fitting training data. To address this gap and advance foundation model research for graphs, we present GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods. We conduct large-scale pretraining on ogbn-papers100M and evaluate five representative SSL methods across a diverse set of target graphs. Our standardized experimental setup decouples confounding factors such as model architecture, dataset characteristics, and adaptation protocols, enabling rigorous comparisons focused solely on pretraining objectives. Surprisingly, we observe that most graph SSL methods struggle to generalize, with some performing worse than random initialization. In contrast, GraphMAE, a masked autoencoder approach, consistently improves transfer performance. We analyze the underlying factors that drive these differences and offer insights to guide future research on transferable graph SSL, laying a solid foundation for the ''pretrain-then-transfer'' paradigm in graph learning. Our code is available at https://github.com/SongYYYY/GSTBench.
DiRW: Path-Aware Digraph Learning for Heterophily
Recently, graph neural network (GNN) has emerged as a powerful representation learning tool for graph-structured data. However, most approaches are tailored for undirected graphs, neglecting the abundant information in the edges of directed graphs (digraphs). In fact, digraphs are widely applied in the real world and confirmed to address heterophily challenges. Despite recent advancements, existing spatial- and spectral-based DiGNNs have limitations due to their complex learning mechanisms and reliance on high-quality topology, resulting in low efficiency and unstable performance. To address these issues, we propose Directed Random Walk (DiRW), a plug-and-play strategy for most spatial-based DiGNNs and also an innovative model which offers a new digraph learning paradigm. Specifically, it utilizes a direction-aware path sampler optimized from the perspectives of walk probability, length, and number in a weight-free manner by considering node profiles and topologies. Building upon this, DiRW incorporates a node-wise learnable path aggregator for generalized node representations. Extensive experiments on 9 datasets demonstrate that DiRW: (1) enhances most spatial-based methods as a plug-and-play strategy; (2) achieves SOTA performance as a new digraph learning paradigm. The source code and data are available at https://github.com/dhsiuu/DiRW.
Extracting Global Temporal Patterns Within Short Look-Back Windows for Traffic Forecasting
With the continuous expansion of urban areas, accurate and effective traffic forecasting has become essential for intelligent urban traffic management. As traffic data inherently exhibits temporal dynamics, modeling its temporal patterns is critical to improve prediction performance. However, constrained by computational complexity, existing methods rely primarily on short-term historical data, which is typically noisy and limits the ability to capture global temporal patterns. To address this issue, we propose a novel Dual-Stream Transformer model (DSformer) that effectively captures global temporal patterns through a time-index model. To mitigate the impact of noise in short look-back windows, DSformer explicitly learns a temporal matrix that encodes structured temporal dependencies. Furthermore, we design a time-index loss that encourages similar representations for adjacent time indices, thereby reducing error propagation across time steps. In parallel, a historical-value stream is employed to model local information. Finally, a self-adaptive learning module is constructed to flexibly and accurately fuse global and local information. Extensive experiments on real-world traffic forecasting tasks across ten diverse scenarios demonstrate that our method consistently outperforms state-of-the-art baselines while maintaining competitive efficiency. The code is available at https://github.com/sky836/DSFormer.git.
Discovering Group Collapser for Network Resilience
Network resilience refers to the ability of a network to maintain its functionality despite perturbations, where resilience/robustness is shown when a substantial proportion of its nodes remain engaged even under changes. Such phenomenon is common in real-world networks, such as computing power networks. Previous works demonstrate that the coreness of a user/node effectively captures the dynamics of user engagement. However, most existing works only consider changes in a single coreness value and thus fail to measure the overall network resilience. Subsequent works are either inefficient or do not consider coreness-decreased scenario. In this paper, we propose and study the collapsed follower maximization problem, aiming to maximize the number of coreness-decreased vertices by finding a group collapser (collapsing a set of vertices) with a given budget. We prove that the problem is NP-hard and W[2]-hard parameterized by the budget b. To address the problem, we first present a Greedy algorithm that iteratively finds the best collapser in each of the budget b iterations. To further optimize the Greedy algorithm, we propose GreedyOpt, which leverages the shell component structure to accelerate the computation of follower for one collapser and prune the search space. Extensive experimental results on 8 real-world datasets show that the effectiveness and efficiency of our algorithms.
AMBER: Adaptive Meta Balanced Paradigm for Heterogeneous Graph-Based Knowledge Tracing
Knowledge Tracing (KT) is a fundamental task in personalized education, aiming to predict student performance by modeling their evolving concept mastery. Recent state-of-the-art approaches adopt multi-graph architectures to capture diverse concept and behavior relations. However, such models often suffer from graph imbalance, where one graph branch dominates training, undermining the benefits of structural integration. To address this, we propose AMBER (Adaptive Meta-Balanced Ensemble Representation learning), a KT framework designed to promote balanced learning across heterogeneous graph structures. AMBER introduces an external dual-graph teacher to guide the learning of ensemble representations. As the teacher itself may encode graph imbalance bias, we further incorporate a meta-distillation strategy that adaptively adjusts the teacher using student feedback, amplifying signals beneficial to underperforming branches. In addition, an adaptive graph rebalancing strategy is introduced to balance the optimization of different graph branches in real time, preventing dominance by any single structure. Experiments on three real-world datasets show that AMBER consistently outperforms competitive baselines. By promoting balanced optimization across graphs, AMBER enables more effective integration of heterogeneous learning signals in KT, providing a robust and scalable solution for personalized education. Code is available at https://github.com/AMBER2025KT/AMBER2025CIKM.
Hearing the Meaning, Not the Mess: Beyond Literal Transcription for Spoken Language
With the rise of virtual communication and smart devices, speech has become the most natural medium of interaction. Yet it remains intrinsically difficult: speech is fleeting, unstructured, and disfluent, making key information prone to loss. Conventional Speech-to-Text (STT) systems attempt to acoustically reconstruct what was said. However, their frame-level alignment and rigid token-by-token decoding break down under noise, interruptions, or fragmentation. Humans, in contrast, readily grasp what was meant by exploiting syntax, discourse, pragmatics, and prosody. We argue for a paradigm shift from acoustic reconstruction to semantic transduction: inferring meaning directly from speech, abstracted from surface distortions. This shift raises two challenges: (C1) the lack of anchors between audio and meaning, and (C2) the need to maintain compositional semantics. To address these, we introduce CogTrans, a cognitively inspired speech-to-meaning framework. CogTrans tackles C1 through a Semantic Anchor Explorer, built on I-JEPA to capture higher-order regularities, prosodic rhythms, cross-frequency coarticulation, discourse continuity-providing resilient semantic scaffolds under noise and fragmentation. For C2, it designs a Lexical-Semantic Harmonizer that dynamically integrates these anchors with lexical embeddings; thereby preserving fine-grained compositional fidelity in roles, order, and entities. Extensive experiments show that CogTrans delivers consistent and substantial gains under challenging conditions. On GigaSpeech, it achieves a 6.58% relative Word Error Rate (WER) reduction, and on the multilingual VoxPopuli benchmark, the gain climbs to 12.97% at 10 dB noise-a regime where conventional models typically collapse. Beyond literal accuracy, CogTrans also boosts semantic fidelity, with a 3.40% increase in ROUGE-L and 3.45% in USE-Sim, ensuring transcripts remain faithful not only in words but also in meaning. Together, these results underscore that CogTrans is robust in noisy, unconstrained environments-precisely the conditions where reliability matters most.
Dynamic Ensemble Member Selection for Data Stream Classification
Ensemble methods are widely recognized for their effectiveness in data stream classification. This paper introduces Dynamic Ensemble Member Selection (DEMS), a novel framework that dynamically selects a subset of classifiers from an ensemble for each individual prediction. DEMS ranks base learners based on estimated accuracy and predictive margin, using only the top-K members for prediction, where K is optimized in a self-adaptive manner. The proposed method significantly enhances predictive performance across various state-of-the-art ensemble algorithms, including Streaming Random Patches, Adaptive Random Forest, and Online Smooth Boost. Experimental results demonstrate that DEMS consistently improves classification accuracy while maintaining a minimal runtime overhead of just 11.66% compared to the original methods. This work highlights the potential of DEMS in adapting to concept drift and optimizing ensemble diversity, offering a practical solution for real-time data stream classification.
MPFormer: Adaptive Framework for Industrial Multi-Task Personalized Sequential Retriever
Modern industrial recommendation systems encounter a core chal- lenge of multi-stage optimization misalignment: a significant semantic gap exists between the multi-objective optimization paradigm (such as jointly optimizing click-through rate, watch duration, and conversion rate) widely used in the ranking phase and the single-objective modeling in the retrieve phase. Although the main-stream industry solution achieves multi-objective coverage throughparallel multi-path single-objective retrieve, this approach leads to linear growth of training and serving resources with the number of objectives and has inherent limitations in handling loosely coupled objectives. This paper proposes the MPFormer, a dynamic multi-task Transformer framework, which systematically addresses the aforementioned issues through three innovative mechanisms. First, an objective-conditioned transformer that jointly encodes user behavior sequences and multi-task semantics through learnable attention modulation; second, personalized target weights are introduced to achieve dynamic adjustment of retrieve results; finally, user personalization information is incorporated into token representations and the Transformer structure to further enhance the model's representation ability. This framework has been successfully integrated into Kuaishou's short video recommendation system, stably serving over 400 million daily active users. It significantly improves user daily engagement and system operational efficiency. Practical deployment verification shows that, compared with traditional solutions, it effectively optimizes the multi-objective retrieve iteration paradigm while maintaining service response speed, providing a scalable multi-objective solution for industrial recommendation systems.
On the Cross-type Homophily of Heterogeneous Graphs: Understanding and Unleashing
Homophily, the tendency of similar nodes to connect, is a fundamental phenomenon in network science and a critical factor in the performance of graph neural networks (GNNs). While existing studies primarily explore homophily in homogeneous graphs, where nodes share the same type, real-world networks are often more accurately modeled as heterogeneous graphs (HGs) with diverse node types and intricate cross-type interactions. This structural diversity complicates the analysis of homophily, as traditional homophily metrics fail to account for distinct label spaces across node types. To address this limitation, we introduce the Cross-Type Homophily Ratio (CHR), a novel metric that quantifies homophily based on the similarity of target information across different node types. Additionally, we propose Cross-Type Homophily-guided Graph Editing (CTHGE), a novel method for improving heterogeneous graph neural networks (HGNNs) performance by optimizing cross-type connectivity using Cross-Type Homophily Ratio. Extensive experiments on five HG datasets with nine HGNNs validate the effectiveness of CTHGE, which delivers a maximum relative performance improvement of over 25% for HGNNs on node classification tasks, offering a fresh perspective on cross-type homophily in HGs learning.
STKGNN: Scalable Spatio-Temporal Knowledge Graph Reasoning for Activity Recognition
The emergence of dynamic, high-volume data streams demands advanced reasoning frameworks to capture complex spatio-temporal relationships that are essential for enabling contextual understanding. However, current approaches often lack scalable and adaptable semantic representations in dynamic and spatio-temporal scenarios. To answer this need, we introduce a novel Spatio-Temporal Knowledge approach based on Graph Neural Networks (STKGNN) for activity recognition. This framework performs graph-based reasoning over semantically enriched Spatio-Temporal Knowledge Graphs (STKGs) constructed from open-source video datasets. By leveraging these custom STKGs, we propose three advanced Graph Neural Network (GNN) based architectures to recognize various activities. Accordingly, we establish a comprehensive approach for spatio-temporal reasoning that adapts to diverse Knowledge Graph structures by addressing adaptability, scalability, and temporal complexities. This framework enhances activity recognition and provides a foundation for wider dynamic or real-time applications in different domains including healthcare, autonomous systems, video surveillance, and various other fields.
ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language Models
Electrocardiogram (ECG) diagnosis aims to automatically classify ECG recordings into clinically meaningful categories, playing a vital role in medical decision-making. Deep learning methods, while promising, demand extensive annotated data and lack interpretability. Large Language Models (LLMs) offer potential in low-data scenarios and generating interpretable outputs, yet their application to ECG diagnosis, especially leveraging multimodal data (e.g., raw signals, derived features, and clinical knowledge), remains underexplored. To address these challenges, we propose ECG-Doctor, an interpretable and multimodal ECG diagnosis framework based on LLMs. ECG-Doctor comprises four key components: (1) ECG Knowledge Acquisition Module, which integrates external medical knowledge and Chain-of-Thought (CoT) reasoning to address the inability of LLMs to follow standardized ECG diagnostic procedures; (2) ECG Feature Extraction Module, which incorporates domain knowledge to overcome LLMs' limitations in comprehensively understanding structured ECG features; (3) ECG Waveform Analysis Module, which introduces time-series ECG models to equip LLMs with the capability to interpret and reason over raw ECG signal morphologies; (4) KNN-based ECG Retrieval Module, which retrieves the top-k most similar ECG samples and guides LLMs through in-context learning (ICL), enabling them to differentiate and learn from variations across ECGs. The outputs of these modules are aggregated and provided to the LLM as diagnostic context, enabling ICL to perform comprehensive ECG diagnosis. This design effectively simulates the diagnostic reasoning process of experienced electrocardiologists. Extensive experiments on the PTB-XL dataset demonstrate that ECG-Doctor is compatible with various LLMs and consistently outperforms existing baselines at both 100 Hz and 500 Hz sampling rates, showcasing its strong versatility and robustness. Furthermore, ECG-Doctor provides well-grounded diagnostic explanations, highlighting its superior interpretability.
X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents
State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation in coordinated information campaigns, posing threats to online discourse integrity. While Large Language Models (LLMs) achieve strong performance on general natural language processing (NLP) tasks, they struggle with subtle propaganda detection and operate as ''black boxes'', providing no interpretable insights into manipulation strategies. This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and provide human-readable explanations for its decisions. X-Troll incorporates appraisal theory and propaganda analysis through specialized LoRA adapters, using dynamic gating to capture campaign-specific discourse patterns in coordinated information operations. Experiments on real-world data demonstrate that our linguistically-informed approach shows strong performance compared with both general LLM baselines and existing troll detection models in accuracy while providing enhanced transparency through expert-grounded explanations that reveal the specific linguistic strategies used by state-sponsored actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.
Selective Mixup for Debiasing Question Selection in Computerized Adaptive Testing
Computerized Adaptive Testing is a widely used technology for evaluating examinees' proficiency in online education platforms. By leveraging prior estimates of proficiency to select questions and updating the estimates iteratively based on responses, it enables personalized examinee modeling and has attracted substantial attention. Despite this progress, most existing works focus primarily on improving proficiency estimation accuracy, while overlooking the selection bias inherent in the adaptive process. Selection bias arises because the question selection is strongly influenced by the estimated proficiency, such as assigning easier questions to examinees with lower proficiency and harder ones to examinees with higher proficiency. Since the selection depends on prior estimation, this bias propagates into the diagnostic model, which is further amplified during iterative updates, leading to misaligned and biased predictions. Moreover, the imbalance in examinees' historical interactions often exacerbates bias in diagnostic models. To address this issue, we propose a debiasing framework consisting of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization. First, we retrieve balanced examinees with relatively even distributions of correct and incorrect responses and use them as neutral references for biased examinees. Then, Mixup is applied between each biased examinee and its matched balanced counterpart under label consistency. This augmentation enriches the diversity of bias-conflicting samples and smooths selection boundaries. Finally, extensive experiments on two benchmark datasets with multiple advanced diagnosis models have been conducted. The results demonstrate that our method substantially improves the generalization ability of question selection.
CoHN: Context-Aware Hawkes Graph Network for Temporal Knowledge Graph Reasoning
Temporal Knowledge Graphs (TKGs) model dynamic events, and understanding temporal evolution is crucial for effective reasoning. While existing methods leverage Graph Neural Networks (GNNs) to model structural dependencies, they often rely on Recurrent Neural Networks (RNNs) to process sequence of graph structures. They struggle to (1) incorporate contextual information, (2) explicitly model long-term effects, and (3) handle in-equidistant data. To address these challenges, we propose Context-Aware Hawkes Graph Network (CoHN), a novel TKG reasoning approach based on the Hawkes process. CoHN features a tailored conditional intensity function that models TKG event occurrences. It is characterized by two additive terms: base intensity and historical influence, representing spontaneous tendency and influence of past events at in-equidistant time intervals. Firstly, we design a Contextual Encoder (CE) to encode contextual information for all entities and compute the base intensity. We then present an attention-based Evolutionary Encoder that captures local structural information and explicitly models long-term dependencies across the TKG. A self-exciting fusion module further aggregates historical evolutionary dependencies at all timestamps to quantify the final historical influence. Extensive experiments on common benchmarks demonstrate the superiority, robustness, and efficiency of our method.
SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation
Out-of-Distribution (OOD) detection in semantic segmentation aims to localize anomalous regions at the pixel level, advancing beyond traditional image-level OOD techniques to better suit real-world applications such as autonomous driving. Recent literature has successfully explored the adaptation of commonly used image-level OOD methods-primarily based on classifier-derived confidence scores (e.g., energy or entropy)-for this pixel-precise task. However, these methods inherit a set of limitations, including vulnerability to overconfidence. In this work, we introduce SupLID, a novel framework that effectively guides classifier-derived OOD scores by exploiting the geometrical structure of the underlying semantic space, particularly using Linear Intrinsic Dimensionality (LID). While LID effectively characterizes the local structure of high-dimensional data by analyzing distance distributions, its direct application at the pixel level remains challenging. To overcome this, SupLID constructs a geometrical coreset that captures the intrinsic structure of the in-distribution (ID) subspace. It then computes OOD scores at the superpixel level, enabling both efficient real-time inference and improved spatial smoothness. We demonstrate that geometrical cues derived from SupLID serve as a complementary signal to traditional classifier confidence, enhancing the model's ability to detect diverse OOD scenarios. Designed as a post-hoc scoring method, SupLID can be seamlessly integrated with any semantic segmentation classifier at deployment time. Our results demonstrate that SupLID significantly enhances existing classifier-based OOD scores, achieving state-of-the-art performance across key evaluation metrics, including AUR, FPR, and AUP. Code is available at https://github.com/hdnugit/SupLID.
Neural Instrumented Factorization: Learning Dynamic Asset Pricing Factors and Loadings through Characteristics Control
Asset pricing theory rests on the principle that differences in expected returns across assets are driven by their exposures to systematic risk factors. Identifying the ''right'' factors-whether observable or latent-remains a central challenge in empirical finance. Traditional latent factor models offer a parsimonious framework for summarizing information from hundreds of observable firm characteristics; however, they are typically estimated solely from return matrices, which limits their ability to capture time-varying, firm-specific dynamics. This study proposes a novel framework---Neural Instrumented Factorization (NeurIF)---that leverages firm characteristics as instruments to learn economically meaningful and time-varying latent factors. NeurIF integrates spatial and temporal attention mechanism to capture nonlinear relationships between firm characteristics and asset returns, jointly learning both the latent factors and their dynamic loadings. The model incorporates orthogonality constraints and deviation-based penalties to ensure the interpretability and alignment of latent factors with observed firm characteristics. Empirical evaluations on real-world asset pricing data reveal that NeurIF consistently outperforms several state-of-the-art transformer based models in return prediction, with improvements ranging from 1% to 18% in test data. Furthermore, the learned factor loadings can generate statistically significant long-short portfolio returns and are not subsumed by other observable factors. The embedded latent factors also exhibit strong explanatory power across several cross-sectional asset pricing anomalies, highlighting their economic relevance and robustness.
KIEPrompter: Leveraging Lightweight Models' Predictions for Cost-Effective Key Information Extraction using Vision LLMs
Key information extraction (KIE) from visually rich documents, such as receipts and forms, involves a deep understanding of textual, visual, and layout feature information. Transformers fine-tuned for KIE achieve state-of-the-art performance but lack generality and portability across different domains. In contrast, vision large language models (VLLMs) offer higher flexibility and zero-shot capability but fall short with domain-specific layout relations unless performing a resource-demanding supervised fine-tuning. To reach the best compromise solution between lightweight models and VLLMs, we propose KIEPrompter, a cost-effective LLM-based KIE approach that leverages the predictions of lightweight models as external knowledge injected into VLLM prompts. By incorporating these auxiliary predictions, VLLMs are guided to attend relevant multimodal content without ad hoc training. The accuracy results achieved by KIEPrompter in three benchmark document collections are superior to those of VLLMs in both zero-shot and layout-sensitive scenarios. We compare various strategies for incorporating lightweight model predictions, ranging from coarse-grained predictions without explicit confidence scores to fine-grained per-element network logits. We also demonstrate that our approach is robust to the absence of specific classes in trained lightweight models, as the VLLMs' pre-training compensates for the limited generality of lightweight models.
FairAD: Computationally Efficient Fair Graph Clustering via Algebraic Distance
Due to the growing concern about unsavory behaviors of machine learning models toward certain demographic groups, the notion of 'fairness' has recently drawn much attention from the community, thereby motivating the study of fairness in graph clustering. Fair graph clustering aims to partition the set of nodes in a graph into k disjoint clusters such that the proportion of each protected group within each cluster is consistent with the proportion of that group in the entire dataset. It is, however, computationally challenging to incorporate fairness constraints into existing graph clustering algorithms, particularly for large graphs. To address this problem, we propose FairAD, a computationally efficient fair graph clustering method. It first constructs a new affinity matrix based on the notion of algebraic distance such that fairness constraints are imposed. A graph coarsening process is then performed on this affinity matrix to find representative nodes that correspond to k clusters. Finally, a constrained minimization problem is solved to obtain the solution of fair clustering. Experiment results on the modified stochastic block model and six public datasets show that FairAD can achieve fair clustering while being up to 40 times faster compared to state-of-the-art fair graph clustering algorithms.
Variety Is the Spice of Life: Detecting Misinformation with Dynamic Environmental Representations
The proliferation of misinformation across diverse social media platforms has drawn significant attention from both academic and industrial communities due to its detrimental effects. Accordingly, automatically distinguishing misinformation, dubbed as Misinformation Detection (MD), has become an increasingly active research topic. The mainstream methods formulate MD as a static learning paradigm, which learns the mapping between the content, links, and propagation of news articles and the corresponding manual veracity labels. However, the static assumption is often violated, since in real-world scenarios, the veracity of news articles may vacillate within the dynamically evolving social environment. To tackle this problem, we propose a novel framework, namely Misinformation detection with Dynamic Environmental Representations (MISDER). The basic idea of MISDER lies in learning a social environmental representation for each period and employing a temporal model to predict the representation for future periods. In this work, we specify the temporal model as the LSTM model, continuous dynamics equation, and pre-trained dynamics system, suggesting three variants of MISDER, namely MISDER-LSTM, MISDER-ODE, and MISDER-PT, respectively. To evaluate the performance of MISDER, we compare it to various MD baselines across 2 prevalent datasets, and the experimental results can indicate the effectiveness of our proposed model.
SPARK: Adaptive Low-Rank Knowledge Graph Modeling in Hybrid Geometric Spaces for Recommendation
Knowledge Graphs (KGs) enhance recommender systems but face challenges from inherent noise, sparsity, and Euclidean geometry's inadequacy for complex relational structures, critically impairing representation learning, especially for long-tail entities. Existing methods also often lack adaptive multi-source signal fusion tailored to item popularity. This paper introduces SPARK, a novel multi-stage framework systematically tackling these issues. SPARK first employs Tucker low-rank decomposition to denoise KGs and generate robust entity representations. Subsequently, an SVD-initialized hybrid geometric GNN concurrently learns representations in Euclidean and Hyperbolic spaces; the latter is strategically leveraged for its aptitude in modeling hierarchical structures, effectively capturing semantic features of sparse, long-tail items. A core contribution is an item popularity-aware adaptive fusion strategy that dynamically weights signals from collaborative filtering, refined KG embeddings, and diverse geometric spaces for precise modeling of both mainstream and long-tail items. Finally, contrastive learning aligns these multi-source representations. Extensive experiments demonstrate SPARK's significant superiority over state-of-the-art methods, particularly in improving long-tail item recommendation, offering a robust, principled approach to knowledge-enhanced recommendation. Implementation code is anonymously online. https://github.com/Applied-Machine-Learning-Lab/SPARK.
Full-Atom Protein-Protein Interaction Prediction via Atomic Equivariant Attention Network
Protein-protein Interaction (PPI) prediction, which aims to identify the interactions between proteins within a biological system, is an important problem in understanding disease mechanisms and drug discovery. Recently, Equivariant Graph Neural Networks (E3-GNNs) are advanced computational models that provide a powerful solution for accurately predicting PPIs by preserving the geometric integrity of protein interactions. However, most E3-GNNs model protein interactions at the residue level, potentially neglecting critical atomic details and side-chain conformations. In this paper, we propose a novel model, MEANT, designed to adaptively extract atom-level geometric information from varying numbers of atoms within different residues for PPI prediction. Specifically, we define a full-atom graph that contains atomic geometry and guides the message passing under the structure of residues. We also design a geometric relation extractor to integrate geometric information from different residues and adaptively handle variations in the number of atoms within each residue. Finally, we adopt the attention mechanism to update the residue representation and the atomic coordinates within a residue. Experimental results show that our proposed model, MEANT, significantly outperforms state-of-the-art methods on three typical PPI prediction tasks. Our code and data are available on GitHub at https://github.com/BUPT-GAMMA/MEANT.
AdaHet-MKD: An Adaptive Heterogeneous Multi-teacher Knowledge Distillation for Medical Image Analysis
Contrastive Language-Image Pre-training (CLIP) has emerged as an effective framework for multi-modal representation learning, achieving notable success in diverse tasks such as medical image analysis. CLIP's growing prominence in medical image applications is restricted by its significant computational demands, creating implementation challenges in resource-constrained clinical environments. While knowledge distillation offers an effective approach for model compression with preserved accuracy, existing methods suffer from two fundamental limitations. Firstly, existing methods focus on learning better information from single models while ignoring the fact that student models can generalize well under the guidance of multiple teachers. Secondly, they overlook the complementary information in the CLIP model where the text encoder and image encoder can be leveraged as heterogeneous information to teach one single modality. To tackle these challenges, we propose an Adaptive Heterogeneous Multi-teacher Knowledge Distillation (AdaHet-MKD) framework for effective knowledge transfer across heterogeneous text-image models and among multiple teacher models. The key innovations include: (i) adaptively determining the contribution of each teacher model to specific instances, thereby generating integrated soft logits, and (ii) enabling the student model to operate independently of the teacher model's architecture, which enhances flexibility in teacher-student pairings. Experimental evaluations on publicly available medical datasets demonstrate that our approach has achieved the state-of-the-art performance compared to baselines.
GegenNet: Spectral Convolutional Neural Networks for Link Sign Prediction in Signed Bipartite Graphs
Given a signed bipartite graph (SBG) G with two disjoint node sets U and V, the goal of link sign prediction is to predict the signs of potential links connecting U and V based on known positive and negative edges in G. The majority of existing solutions towards link sign prediction mainly focus on unipartite signed graphs, which are sub-optimal due to the neglect of node heterogeneity and unique bipartite characteristics of SBGs. To this end, recent studies adapt graph neural networks to SBGs by introducing message-passing schemes for both inter-partition (U x V) and intra-partition (U x U or V x V) node pairs. However, the fundamental spectral convolutional operators were originally designed for positive links in unsigned graphs, and thus, are not optimal for inferring missing positive or negative links from known ones in SBGs. Motivated by this, this paper proposes GegenNet, a novel and effective spectral convolutional neural network model for link sign prediction in SBGs. In particular, GegenNet achieves enhanced model capacity and high predictive accuracy through three main technical contributions: (i) fast and theoretically grounded spectral decomposition techniques for node feature initialization; (ii) a new spectral graph filter based on the Gegenbauer polynomial basis; and (iii) multi-layer sign-aware spectral convolutional networks alternating Gegenbauer polynomial filters with positive and negative edges. Our extensive empirical studies reveal that GegenNet can achieve significantly superior performance (up to a gain of 4.28% in AUC and 11.69% in F1) in link sign prediction compared to 11 strong competitors over 6 benchmark SBG datasets.
Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
Text clustering aims to automatically partition a collection of documents into coherent groups based on their linguistic features. In the literature, this task is formulated either as metric clustering over pre-trained text embeddings or as graph clustering based on pairwise similarities derived from an oracle, e.g., a large machine learning model. Recent advances in large language models (LLMs) have significantly improved this field by providing high-quality contextualized embeddings and accurate semantic similarity estimates. However, leveraging LLMs at scale introduces substantial computational and financial costs due to the large number of required API queries or inference calls. To address this issue, we propose Cequel, a cost-effective framework that achieves accurate text clustering under a limited budget of LLM queries. At its core, Cequel constructs must-link and cannot-link constraints by selectively querying LLMs on informative text pairs or triplets, identified via our proposed algorithms, EdgeLLM and TriangleLLM. These constraints are then utilized in a weighted constrained clustering algorithm to form high-quality clusters. Specifically, EdgeLLM and TriangleLLM employ carefully designed greedy selection strategies and prompting techniques to identify and extract informative constraints efficiently. Experiments on multiple benchmark datasets demonstrate that Cequel consistently outperforms existing methods in unsupervised text clustering under the same query budget.
TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models
Large language models (LLMs) have shown promise in multivariate time series classification (MTSC). To effectively adapt LLMs for MTSC, it is crucial to generate comprehensive and informative data representations. Most methods utilizing LLMs encode numerical time series into the model's latent space, aiming to align with the semantic space of LLMs for more effective learning. Despite effectiveness, we highlight three limitations that these methods overlook: (1) they struggle to incorporate temporal and channel-specific information, both of which are essential components of multivariate time series; (2) aligning the learned representation space with the semantic space of the LLMs proves to be a significant challenge; (3) they often require task-specific retraining, preventing training-free inference despite the generalization capabilities of LLMs. To bridge these gaps, we propose TableTime, which reformulates MTSC as a table understanding task. Specifically, TableTime introduces the following strategies: (1) utilizing tabular form to unify the format of time series, facilitating the transition from the model-centric approach to the data-centric approach; (2) representing time series in text format to facilitate seamless alignment with the semantic space of LLMs; (3) designing a knowledge-task dual-driven reasoning framework, TableTime, integrating contextual information and expert-level reasoning guidance to enhance LLMs' reasoning capabilities and enable training-free classification. Extensive experiments conducted on 10 publicly available benchmark datasets from the UEA archive validate the substantial potential of TableTime to be a new paradigm for MTSC. The code is publicly available. https://github.com/realwangjiahao/TableTime.
Weakly Supervised Fine-grained Span-Level Framework for Chinese Radiology Report Quality Assurance
Quality Assurance (QA) for radiology reports refers to judging whether the junior reports (written by junior doctors) are qualified. The QA scores of one junior report are given by the senior doctor(s) after reviewing the image and junior report. This process requires intensive labor costs for senior doctors. Additionally, the QA scores may be inaccurate for reasons like diagnosis bias, the ability of senior doctors, and so on. To address this issue, we propose a Span-level Quality Assurance EvaluaTOR (Sqator) to mark QA scores automatically. Unlike the common document-level semantic comparison method, we try to analyze the semantic difference by exploring more fine-grained text spans. Specifically, Sqator measures QA scores by measuring the importance of revised spans between junior and senior reports, and outputs the final QA scores by merging all revised span scores. We evaluate Sqator using a collection of 12,013 radiology reports. Experimental results show that Sqator can achieve competitive QA scores. Moreover, the importance scores of revised spans can be also consistent with the judgments of senior doctors.
ACMCG: A Cost-effective Active Clustering with Minimal Constraint Graph
Active clustering enhances traditional semi-supervised clustering by introducing machine-led interaction, where informative constraints are dynamically selected and posed to humans. This enables goal-driven interaction and reduces the number of required constraints for achieving high-quality clustering. In this paper, we propose a newly designed Active Clustering framework with Minimal Constraint Graph (ACMCG). ACMCG operates on two cooperating tailored sparse graphs: a tree-structured graph (clustering tree) representing the nested clustering result, and a minimal constraint graph that supports constraint deduction during iterative refinement. In each refinement round, (a) the most suspicious edge in the tree is identified for constraint verification; (b) if a cannot-link constraint is confirmed, a pruning-and-grafting approach is performed to refine the clustering tree, guided by our proposed constraint deduction strategies; (c) the constraint is either deduced from the minimal constraint graph using transitive and probabilistic deduction, or obtained via user interaction when deduction fails. Extensive experiments across diverse domains demonstrate that ACMCG consistently outperforms both classical and state-of-the-art methods in accuracy, while significantly reducing the number of user-provided constraints and maintaining low computational cost, highlighting its cost-effectiveness in real-world applications.
Strong Forgetting for ALCQ-Ontologies
Forgetting is a non-standard reasoning procedure used to refine an ontology into a sub-signature by eliminating symbols not included in this subset, addressing fundamental challenges in knowledge management where ontology refinement and reuse are crucial for efficient information processing. It has two forms: weak forgetting (aka uniform interpolation), which preserves entailments within the source language, and strong forgetting, which additionally ensures model preservation modulo the eliminated symbols. This makes the latter significantly more challenging to compute. In this paper, we present the first method for strong role forgetting in description logics with qualified number restrictions (Q ). In particular, the method takes ALCQ-ontologies as input, yielding output either in ALCQ or in ALCQ(∇) by further incorporating the universal role ∇ to avoid information loss. This preserves model-theoretic properties crucial for applications such as modal correspondence theory and second-order quantifier elimination. While the method guarantees termination and soundness, its completeness is inherently constrained by the undecidability of strong forgetting. However, empirical evaluations on the Oxford-ISG and BioPortal benchmarks show that this theoretical limitation barely impedes practical utility, with experiment results demonstrating superb success rates and remarkably high efficiency.
Spatio-Temporal Wavelet Enhanced Attention Mamba for Stock Price Forecasting
Stock price forecasting remains a critical challenge due to market non-stationarity and the influence of multiple factors. Existing studies apply frequency domain analysis methods to mitigate the impacts of non-stationarity by decoupling high- and low-frequency variation patterns. However, these approaches primarily focus on single series decomposition while neglecting cross frequency interactions among different stocks. Moreover, as a key indicator of overall market trends, current methods inadequately utilize market index information. In this paper, we propose STEAM, a Spatio-Temporal Wavelet Enhanced Attention Mamba model. We introduce Discrete Wavelet Transform (DWT) to disentangle multi-frequency temporal features and propose Wavelet Enhanced Attention (WEA) to capture cross frequency spatial dependencies, effectively leveraging both local and global inter-stock relationships. To extract the synergistic spatio-temporal dependencies in stock data, AMamba module is designed that integrates WEA into the Mamba-2 architecture. Additionally, to further enhance the model's perception of macro-market conditions, we incorporate market index as a prefix, guiding predictions with holistic market information in both spatial and temporal dependencies learning. Extensive experiments across multiple national stock markets demonstrate that STEAM achieves state-of-the-art forecasting performance.
GraphRCG: Self-Conditioned Graph Generation
Graph generation aims to create new graphs that closely align with a target graph distribution. Existing works often implicitly capture this distribution by aligning the output of a generator with each training sample. As such, the overview of the entire distribution is not explicitly captured and used for graph generation. In contrast, in this work, we propose a novel self-conditioned graph generation framework designed to explicitly model graph distributions and employ these distributions to guide the generation process. We first perform self-conditioned modeling to capture the graph distributions by transforming each graph sample into a low-dimensional representation and optimizing a representation generator to create new representations reflective of the learned distribution. Subsequently, we leverage these bootstrapped representations as self-conditioned guidance for the generation process, thereby facilitating the generation of graphs that more accurately reflect the learned distributions. We conduct extensive experiments on generic and molecular graph datasets. Our framework, GraphRCG, demonstrates superior performance over existing state-of-the-art graph generation methods in terms of graph quality and fidelity to training data.
LCHGNN: Towards Distributed Hypergraph Neural Network Training Based on Communication Graphs with Lightweight Communication Optimization
Hypergraph Neural Networks (HGNNs) build on Graph Neural Networks (GNNs) by using hyperedges to capture complex, high-order relationships in data. However, training HGNNs on large hypergraphs is limited by computational and memory bottlenecks on a single machine. To overcome this, we propose LCHGNN, a distributed training method based on a new data structure called the communication graph, which simplifies hypergraph communication by representing cut hyperedges as vertices for structured message passing. LCHGNN employs a vertex-centric, hyperedge-replication-based storage scheme and introduces specialized forward and backward propagation mechanisms tailored for distributed execution. To mitigate communication overhead, We propose a lightweight optimization strategy that employs full synchronization in the initial round, followed by lightweight synchronization in subsequent rounds. Additionally, we present a learnable semi-supervised synchronization (LSS) aggregation mechanism for adaptive hyperedge selection. Extensive experiments on benchmark datasets demonstrate that LCHGNN preserves training accuracy while substantially reducing communication costs and enhancing scalability. This work addresses a critical gap in distributed HGNN research by delivering a communication-efficient and scalable training method, thereby facilitating the application of hypergraph learning to large-scale problems.
MFAE: Multimodal Feature Adaptive Enhancement for Fake News Video Detection
With the rapid global growth of short video platforms, the spread of fake news has become increasingly prevalent, creating an urgent demand for effective automated detection methods. Current approaches typically rely on feature extractors to gather information from multiple modalities and then generate predictions through classifiers. However, these methods often fail to fully utilize the complex information across all modalities and overlook the potential for video manipulation, limiting their overall performance. To tackle these issues, MFAE is proposed, a novel framework for Multimodal Feature Adaptive Enhancement for Fake News Video Detection. The framework starts by extracting semantic and emotional features from the news, which are the basis for generating coarse multimodal representations. These representations are further refined through Adaptive Enhancement, a module specifically designed to strengthen the visual and audio modalities. Subsequently, spatial and temporal features are extracted separately, with temporal features undergoing additional refinement via a Temporal Enhancement module. The final result is obtained by feeding the individually enhanced features into the multimodal feature integration module for interaction Comprehensive experiments on two benchmark datasets highlight the exceptional performance of MFAE in detecting fake news on short video platforms. Specifically, the method achieves accuracy improvements of 2.21% and 4.35% on FakeSV and FakeTT, respectively.
WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting
Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.
MGFSG-EE: A Method based on Multi-grained Fusion and Scene Graph Enhancement for Event Extraction
The Multimedia Event Extraction (MEE) task, as a core task in the field of event analysis, has seen many benefits in downstream applications. Existing MEE methods focus on the fusion of co-occurring information in images and text during the fusion process, failing to model the background information correlation between images and text, making it difficult to extract events and arguments in complex scenarios. Meanwhile, the neglect of utilizing the interaction information between objects in images leads to the loss of event information and incomplete argument extraction. To address these issues, a novel method based on Multi-grained Fusion and Scene Graph Enhancement (MGFSG-EE) has been proposed, that introduces a Multi-grained Fusion Module, which captures co-occurring information between image and text through dynamic screening of cross-modal features for coarse-grained fusion; builds a Multimodal Graph and utilizes graph convolutional neural networks (GCN) to achieve fine-grained interaction fusion at the vector level, mining semantic associations and contextual information between image and text. Moreover, MGFSG-EE constructs a Scene Graph to model the spatial and semantic relationships among objects and uses GCN to learn rich event representations. In M2 E2 benchmark dataset, MGFSG-EE outperforms the existing SOTA baselines, particularly in visual and multimedia event tasks, the F1 score for the event trigger task is improved by 3.2% and 2.3%, respectively; and for event argument extraction task it is improved by 2.7% and 2.8%, respectively, verifying the effectiveness of the proposed method.
Rethinking Lipschitzness Data-free Backdoor Defense
Deep Neural Networks (DNNs) have demonstrated remarkable success across various applications, yet some studies reveal their vulnerability to backdoor attacks, where attackers manipulate models under specific conditions using triggers. It significantly compromise the model integrity. Addressing this critical security issue requires robust defence mechanisms to ensure the reliability of DNN models. However, most existing defence mechanisms heavily rely on specialized defence datasets, which are often difficult to obtain due to data privacy and security concerns. This highlights the urgent need for effective data-free defence strategies. In this work, we propose Lipschitzness Precise Pruning (LPP), a novel data-free backdoor defence algorithm that leverages the properties of Lipschitz function to detect and mitigate backdoor vulnerabilities by pruning neurons with strong backdoor correlations while fine-tuning unaffected neurons. Our approach optimizes the computation of the Lipschitz constant using dot product properties, allowing for efficient and precise identification of compromised neurons without the need of clean defence data. This method addresses the limitations of existing data-free defences and extends the scope of backdoor mitigation to include fully connected layers, ensuring comprehensive protection of DNN models. As our approach does not require data exchange, it can be implemented efficiently and effectively in diverse environments. Extensive experiments demonstrate that LPP outperforms state-of-the-art defence approaches without the need for additional defence datasets. We release our code at: https://github.com/LMBTough/LPP
Transformers are Good Clusterers for Lifelong User Behavior Sequence Modeling
Modeling user long-term behavior sequences is critical for enhancing Click-Through Rate (CTR) prediction. Existing methods typically employ two cascaded search units-General Search Unit (GSU) for rapid retrieval and Exact Search Unit (ESU) for precise modeling-to balance efficiency and effectiveness. However, they are constrained to recent behaviors due to computational limitations. Clustering user behaviors offers a potential solution, enabling GSU to access lifelong behaviors while maintaining inference efficiency, but current clustering approaches often lack generalizability, or fail to remain effective in high-dimensional data due to non-end-to-end clustering and recommendation. Given that centroids in clustering group similar data points based on proximity, similar to how queries function in transformers, we can integrate the learning of queries with CTR tasks in an end-to-end manner, shifting clustering from meaningless Euclidean distances to meaningful semantic distances. Therefore, we propose C-Former, a transformer-based clustering model specifically designed for modeling lifelong behavior sequences. The C-Former encoder leverages a group of learnable clustering anchor points that access the lifelong user behaviors to extract personalized interests. Then, the C-Former decoder reconstructs lifelong user behaviors based on the compact output of the encoder. The reconstruction and orthogonal loss ensure that centroids are informative and diverse in capturing user preferences. Clustering is further guided by supervisory signals from CTR, establishing an end-to-end framework. The proposed C-Former achieves linear time complexity in training with respect to sequence length and significantly reduces inference latency by directly utilizing cached centroids. Experiments on four benchmark datasets demonstrate the effectiveness of C-Former for lifelong user behavior sequence modeling. The code is available at https://github.com/pepsi2222/C-Former.
CLUE: Using Large Language Models for Judging Document Usefulness in Web Search Evaluation
The widely adopted Cranfield paradigm fails to adequately capture user satisfaction due to a weak relevance-satisfaction correlation. Additionally, constructing test collections incurs high relevance annotation costs. To address these two limitations, we aim to explore the use of large language models (LLMs) to generate multilevel usefulness labels. We propose CLUE, a user-centric evaluation method that explicitly incorporates users' search context and behavior information into LLMs. Inspired by ordinal regression, it employs a cascade structure tailored for multilevel usefulness judgments. Our study shows that using CLUE, LLMs can effectively assess usefulness when provided with search context and behavior, outperforming third-party labeling methods. We also conduct ablation studies to explore the impact of each component in CLUE. Finally, we utilize the usefulness labels generated by CLUE to predict user satisfaction. Real-world experiments reveal that incorporating CLUE's usefulness labels significantly enhances the performance of the satisfaction prediction model.
Revisiting Trajectories to Road: A New Diffusion Model and A New Dataset with 1,000,000,000 Points
Rapid urban growth and road network evolution make it increasingly difficult to maintain accurate digital maps. Traditional manual or satellite-based updates are often delayed or insufficiently detailed. The trajectory-to-road (T2R) task addresses these limitations by leveraging GPS trajectory data to reconstruct up-to-date road networks, providing a scalable solution for navigation, ride-hailing, and urban planning. Existing T2R methods face significant challenges due to their reliance on numerous statistical features and limited generative capabilities. Additionally, current datasets are often outdated and come from a single mobility source, leading to biased urban dynamics and poor generalizability. To address these issues, we introduce DiffusionT2R, the first diffusion-based framework for T2R. DiffusionT2R leverages three key innovations: a multi-channel trajectory representation to provide fine-grained conditioning for guiding the denoising process; a Dual-Level Mixture of Filters that enhances feature extraction at both local and global scales; and a consistency constraint to ensure spatial alignment with input trajectories, preserving road network realism. We also present the largest available trajectory dataset with up-to-date road networks, diverse mobility patterns and high-quality filtering. Experimental results show that DiffusionT2R outperforms existing methods, delivering accurate, realistic, and generalizable road networks with improved robustness in real-world scenarios. The dataset TXBJ is available at https://github.com/ywyangwang/TXBJ.
Generative Data Augmentation in Graph Contrastive Learning for Recommendation
Recommendation systems have become indispensable in various online platforms, from e-commerce to streaming services. A fundamental challenge in this domain is learning effective embeddings from sparse user-item interactions. While contrastive learning has recently emerged as a promising solution to this issue, generating augmented views for contrastive learning through most existing random data augmentation methods often leads to the alteration of original semantic information. In this paper, we propose a novel framework, GDA4Rec (Generative Data Augmentation in graph contrastive learning for Recommendation) to generate high-quality augmented views and provide robust self-supervised signals. Specifically, we employ a noise generation module that leverages deep generative models to approximate the distribution of original data for data augmentation. Additionally, GDA4Rec further extracts an item complement matrix to characterize the latent correlations between items and provide additional self-supervised signals. Lastly, a joint objective that integrates recommendation, data augmentation and contrastive learning is used to enforce the model to learn more effective and informative embeddings. Extensive experiments are conducted on three public datasets to demonstrate the superiority of the model. The code is available at: https://github.com/MrYansong/GDA4Rec.
Towards Reliable GNNs: Adversarial Calibration Learning for Confidence Estimation
Graph neural networks (GNNs) have achieved strong predictive performance across a range of tasks, yet they often exhibit poor confidence calibration-where the predicted confidence scores do not accurately reflect the true likelihood of correctness. This shortcoming raises concerns about their reliability in critical domains such as fraud detection and risk assessment, where well-calibrated predictions are essential for sound decision-making. Although several calibration methods have been proposed for GNNs, our experiments reveal that they tend to focus on global calibration while failing to generalize across different node groups, such as those defined by degree, class, or local structural patterns. In some cases, these methods even degrade calibration performance compared to the original uncalibrated models. To address this limitation, we introduce AdvCali, a novel framework that adaptively improves calibration across diverse node groups. AdvCali employs adversarial training to automatically identify miscalibrated groups and incorporates a differentiable Group Expected Calibration Error (ECE) loss to refine confidence estimates within them. This enables the model to adjust its calibration strategy dynamically, without relying on prior knowledge of which node groups are miscalibrated. Extensive experiments on real-world datasets show that AdvCali not only improves global calibration but also significantly enhances calibration within groups defined by feature similarity, graph topology, and connectivity patterns, outperforming existing approaches.
Understanding the Embedding Models on Hyper-relational Knowledge Graph
Recently, Hyper-relational Knowledge Graphs (HKGs) have been proposed as an extension of traditional Knowledge Graphs (KGs) to better represent real-world facts with additional qualifiers. As a result, researchers have attempted to adapt classical Knowledge Graph Embedding (KGE) models for HKGs by designing extra qualifier processing modules. However, it remains unclear whether the superior performance of Hyper-relational KGE (HKGE) models arises from their base KGE model or the specially designed extension module. In this paper, we data-wise convert HKGs to KG format using decomposition methods and then evaluate several classical KGE models' performance on HKGs. Our results show that some KGE models achieve comparable performance to HKGE models. Upon further analysis, we find that the decomposition methods alter the original HKG topology and fail to fully preserve HKG information. Moreover, we observe that current HKGE models are either insufficient in capturing the graph's long-range dependency or struggle to integrate main-triple and qualifier information due to the information compression issue. To further justify our findings and provide a direction for HKGE research, we propose FormerGNN, which employs a qualifier integrator to preserve the original HKG topology, a GNN-based graph encoder to capture the graph's long-range dependencies, and an improved approach for integrating main-triple and qualifier information to mitigate compression issues. Our experimental results demonstrate that FormerGNN outperforms existing HKGE models.
PRIMA: Privacy preserving Multi-dimensional Analytic Approach
Sum query is an important and fundamental operator for online analytical processing. In this paper, we focus on the process of answering sum queries over data cube, each of which consists of a collection of cuboids, while satisfying differential privacy (DP). Existing works fail to process the sum queries in online analytical processing with high utility due to sum queries' high sensitivity and the noise aggregation: constructing a base cuboid requires the data curator to answer a workload of linear sum queries under DP in advance, whose sensitivity will result in a large amount of DP noise, and the noise will finally be aggregated when constructing the remaining cuboids. To this end, we present a Differentially PRIvate Multi-dimensional Analytic Approach (PRIMA). In PRIMA, we propose a Symmetric Bounded Sum Query Processing Method (SBS) which reduces the sensitivity of sum queries by bounding both the maximum and minimum contribution of each record in the data table in a symmetricaly manner. Moreover, we propose a Hypothesis Testing based Prefix Sum Computing Method (SCOPE) to compute a base prefix-sum cuboid based on hypothesis testing. By employing the base prefix-sum cuboid, any remaining cuboid can be constructed with constant pieces of DP noise aggregated. We conduct experiments on both real-world and synthetic datasets. Experimental results confirm the effectiveness of PRIMA over existing works.
ORCAS: Obfuscation-Resilient Binary Code Similarity Analysis using Dominance Enhanced Semantic Graph
Binary code similarity analysis (BCSA) serves as a foundational technique for binary analysis tasks such as vulnerability detection and malware identification. Existing graph based BCSA approaches capture more binary code semantics and demonstrate remarkable performance. However, when code obfuscation is applied, the unstable control flow structure degrades their performance. To address this issue, we develop ORCAS, an Obfuscation-Resilient BCSA model based on Dominance Enhanced Semantic Graph (DESG). The DESG is an original binary code representation, capturing more binaries' implicit semantics without control flow structure, including inter-instruction relations (e.g., def-use), inter-basic block relations (i.e., dominance and post-dominance), and instruction-basic block relations. ORCAS takes binary functions from different obfuscation options, optimization levels, and instruction set architectures as input and scores their semantic similarity more robustly. Extensive experiments have been conducted on ORCAS against eight baseline approaches over the BinKit dataset. For example, ORCAS achieves an average 12.1% PR-AUC improvement when using combined three obfuscation options compared to the state-of-the-art approaches. In addition, an original obfuscated real-world vulnerability dataset has been constructed and released to facilitate a more comprehensive research on obfuscated binary code analysis. ORCAS outperforms the state-of-the-art approaches over this newly released real-world vulnerability dataset by up to a recall improvement of 43%.
Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings and Semantic IDs
Sequential recommendation (SR) aims to capture users' dynamic interests and sequential patterns based on their historical interactions. Recently, the powerful capabilities of large language models (LLMs) have driven their adoption in SR. However, we identify two critical challenges in existing LLM-based SR methods: 1) embedding collapse when incorporating pre-trained collaborative embeddings and 2) catastrophic forgetting of quantized embeddings when utilizing semantic IDs. These issues dampen the model scalability and lead to suboptimal recommendation performance. Therefore, based on LLMs like Llama3-8B-instruct, we introduce a novel SR framework named MME-SID, which integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse. Additionally, we propose a Multimodal Residual Quantized Variational Autoencoder (MM-RQ-VAE) with maximum mean discrepancy as the reconstruction loss and contrastive learning for alignment, which effectively preserve intra-modal distance information and capture inter-modal correlations, respectively. To further alleviate catastrophic forgetting, we initialize the model with the trained multimodal code embeddings. Finally, we fine-tune the LLM efficiently using LoRA in a multimodal frequency-aware fusion manner. Extensive experiments on three public datasets validate the superior performance of MME-SID thanks to its capability to mitigate embedding collapse and catastrophic forgetting. The implementation code and datasets are publicly available for reproduction: https://github.com/Applied-Machine-Learning-Lab/MME-SID.
Beyond Surface Similarity: A Riemannian Hierarchical Ranking Framework for Sociological Concept Equivalence
Vocabularies such as the European Language Social Science Thesaurus (ELSST) and the CLOSER ontology are the foundational taxonomies capturing core social science concepts that form the foundations of large-scale longitudinal social science surveys. However, standard text embeddings often fail to capture the complex hierarchical and relational structures of the sociological concepts, relying on surface similarity. In this work, we propose a framework to model these nuances by adapting a large language model based text embedding model with a learnable diagonal Riemannian metric. This metric allows for a flexible geometry where dimensions can be scaled to reflect semantic importance. Additionally, we introduce a Hierarchical Ranking Loss with dynamic margins as the sole training objective to enforce the multi-level hierarchical constraints (e.g., distinguishing 'self' from narrower, broader, or related concepts, and all from 'unrelated' ones) from ELSST within the Riemannian space, such as ensuring a specific concept like 'social stratification' is correctly positioned by, for instance, being embedded closer to 'social inequality' (as its broader, related concept) and substantially further from an 'unrelated' concept like 'particle physics'. Lastly, we show that our parameter-efficient approach significantly outperforms strong contrastive learning and hyperbolic embedding baselines on hierarchical concept retrieval and classification tasks using the ELSST and CLOSER datasets. Visualizations confirm the learned embedding space exhibits a clear hierarchical structure. Our work offers a more accurate and geometrically informed method for representing complex sociological constructs.
TimeRAG: Enhancing Complex Temporal Reasoning with Search Engine Augmentation
While Large Language Models (LLMs) augmented with search engines have achieved remarkable progress in open-domain question answering, their ability to adapt to a rapidly evolving world remains limited. A critical challenge lies in the need for complex temporal reasoning to answer real-world questions. Current Retrieval-Augmented Generation (RAG) methods primarily focus on retrieving the latest information but often fail to perform sophisticated temporal reasoning. To address this gap, we propose TimeRAG, a novel RAG framework designed to dynamically handle complex temporal reasoning tasks. TimeRAG operates through the iterative collaboration of two modules: (1) a temporal-semantic Query Decomposition (QD) module, which breaks down the original question into atomic time-event sub-questions to guide multi-step retrieval, and (2) a time-aware Answer Generation (AG) module, which analyzes temporal contexts, generates intermediate answers with confidence scores, and synthesizes the final answer upon reasoning completion. The system is trained in three stages: (1) time-aware supervised fine-tuning of the AG module, (2) imitation learning for the QD module to enhance temporal decomposition ability, and (3) reinforcement learning for end-to-end joint optimization to enhance temporal coherence across the entire system. Evaluations on three challenging benchmarks show that TimeRAG significantly outperforms existing methods, particularly on questions involving fast-changing real-world events and those grounded in false premises that require detection and correction of outdated or incorrect assumptions.
Forecasting at Full Spectrum: Holistic Multi-Granular Traffic Modeling under High-Throughput Inference Regimes
Notably, current intelligent transportation systems rely heavily on accurate traffic forecasting and swift inference provision to make timely decisions. While Graph Convolutional Networks (GCNs) have shown benefits in modeling complex traffic dependencies, the existing GCN-based approaches cannot fully extract and fuse multi-granular spatiotemporal features across various spatial and temporal scales sufficiently in a complete manner, proven to yield less accurate results. As extracting multi-granular features across scales has been a promising strategy across domains such as computer vision, natural language processing, and time-series forecasting, pioneering studies have attempted to leverage a similar mechanism for spatiotemporal traffic data mining. However, additional feature extraction branches introduced in prior studies critically increased model complexity and extended inference time, making it challenging to provide fast forecasts. In this paper, we propose MultiGran-STGCNFog, an efficient fog distributed inference system with a novel traffic forecasting model that employs multi-granular spatiotemporal feature fusion on generated dynamic traffic graphs to fully capture interdependent traffic dynamics. The proposed scheduling algorithm GA-DPHDS, optimizing layer execution order and layer-device scheduling scheme simultaneously, contributes to considerable inference throughput improvement by coordinating heterogeneous fog devices in a pipelined manner. Extensive experiments on real-world datasets demonstrate the superiority of the proposed method over selected GCN baselines.
Bridging Thoughts and Words: Graph-Based Intent-Semantic Joint Learning for Fake News Detection
Fake news detection is an important and challenging task for defending online information integrity. Existing state-of-the-art approaches typically extract news semantic clues, such as writing patterns that include emotional words, stylistic features, etc. However, detectors tuned solely to such semantic clues can easily fall into surface detection patterns, which can shift rapidly in dynamic environments, leading to limited performance in the evolving news landscape. To address this issue, this paper investigates a novel perspective by incorporating news intent into fake news detection, bridging intents and semantics together. The core insight is that by considering news intents, one can deeply understand the inherent thoughts behind news deception, rather than the surface patterns within words alone. To achieve this goal, we propose Graph-based INtent-Semantic joInt moDEling (InSide) for fake news detection, which models deception clues from both semantic and intent signals via graph-based joint learning. Specifically, Inside reformulates news semantic and intent signals into heterogeneous graph structures, enabling long-range context interaction through entity guidance and capturing both holistic and implementation-level intent via coarse-to-fine intent modeling. To achieve better alignment between semantics and intents, we further develop a dynamic pathway-based graph alignment strategy for effective message passing and aggregation across these signals by establishing a common space. Extensive experiments on four benchmark datasets demonstrate the superiority of the proposed Inside compared to state-of-the-art methods.
Learning from Graph: Mitigating Label Noise on Graph through Topological Feature Reconstruction
Graph Neural Networks (GNNs) have shown remarkable performance in modeling graph data. However, Labeling graph data typically relies on unreliable information, leading to noisy node labels. Existing approaches for GNNs under Label Noise (GLN) employ supervision signals beyond noisy labels for robust learning. While empirically effective, they tend to over-reliance on supervision signals built upon external assumptions, leading to restricted applicability. In this work, we shift the focus to exploring how to extract useful information and learn from the graph itself, thus achieving robust graph learning. From an information theory perspective, we theoretically and empirically demonstrate that the graph itself contains reliable information for graph learning under label noise. Based on these insights, we propose the Topological Feature Reconstruction (TFR) method. Specifically, TFR leverages the fact that the pattern of clean labels can more accurately reconstruct graph features through topology, while noisy labels cannot. TFR is a simple and theoretically guaranteed model for robust graph learning under label noise. We conduct extensive experiments across datasets with varying properties. The results demonstrate the robustness and broad applicability of our proposed TFR compared to state-of-the-art baselines. Codes are available at https://github.com/eaglelab-zju/TFR.
Exploration and Visualization of a Legal Knowledge Graph: A Human-Centered Approach
Building applications for users in the legal domain is challenging due to their strict requirements. In this work, we present a prototype combining search in legal norms, legal cases, and legal textbooks with traversal options for the references between these documents, modeled in a knowledge graph in the backend. We conducted a usability test with law students as one of the main target user groups to evaluate the prototype. The usability test comprised user surveys, retrieval test tasks, and a law exam. In our study (n = 20), prototype users solved significantly more retrieval tasks than controls (M = 12.7 vs. 7.5, p = .006) and rated usability at 62.3 SUS points, identifying clear advantages in citation traversal and textbook reference tasks. While exam scores showed no statistically significant difference, qualitative feedback confirmed improved efficiency and satisfaction. We also compare user performance to Large Language Models (LLMs) in vanilla and Retrieval-Augmented Generation configurations, motivated by participant interest in AI-assisted features.
CAGCL: A Community-Aware Graph Contrastive Learning Model for Social Bot Detection
Malicious social bot detection is vital for social network security. While graph neural networks (GNNs) based methods have improved performance by modeling structural information, they often overlook latent community structures, resulting in homogeneous node representations. Leveraging community structures, which capture discriminative group-level patterns, is therefore essential for more robust detection. In this paper, we propose a new Community-Aware Graph Contrastive Learning (CAGCL) framework for enhanced social bot detection. Specifically, CAGCL first exploits the latent community structures to uncover the potential group-level patterns. Then, a dual-perspective community enhancement module is proposed, which strengthens the structural awareness and reinforces topological consistency within communities, thereby enabling more distinctive node representations and deeper intra-community message passing. Finally, a community-aware contrastive learning module is proposed, which considers nodes within the same community as positive pairs and those from different communities as negative pairs, enhancing the discriminability of node representations. Extensive experiments conducted on multiple benchmark datasets demonstrate that CAGCL consistently outperforms state-of-the-art baselines. The code is available at https://github.com/cgao-comp/.
GCLS2: Towards Efficient Community Detection Using Graph Contrastive Learning with Structure Semantics
Due to the power of learning representations from unlabeled graphs, graph contrastive learning (GCL) has shown excellent performance in community detection tasks. Existing GCL-based methods on the community detection usually focused on learning attribute representations of individual nodes, which, however, ignores structure semantics of communities (e.g., nodes in the same community should be structurally cohesive). Therefore, in this paper, we consider the community detection under the community structure semantics and propose an effective framework for graph contrastive learning under structure semantics (GCLS2) to detect communities. To seamlessly integrate interior dense and exterior sparse characteristics of communities with our contrastive learning strategy, we employ classic community structures to extract high-level structural views and design a structure semantic expression module to augment the original structural feature representation. Moreover, we formulate the structure contrastive loss to optimize the feature representation of nodes, which can better capture the topology of communities. To adapt to large-scale networks, we design a high-level graph partitioning (HGP) algorithm that minimizes the community detection loss for GCLS2 online training. It is worth noting that we prove a lower bound on the training of GCLS2 from the perspective of the information theory, explaining why GCLS2 can learn a more accurate representation of the structure. Extensive experiments have been conducted on various real-world graph datasets and confirmed that GCLS2 outperforms nine state-of-the-art methods, in terms of the accuracy, modularity, and efficiency of detecting communities.
DANet: A RAG-inspired Dual Attention Model for Few-shot Time Series Prediction
Practical applications often require forecasting the future states of short time series (STS) using multiple related long time series (LTS) as auxiliary data, a process known as few-shot prediction. The primary challenge, given the limited data on STS, is effectively capturing the pattern similarities between STS and LTS. Current methods, despite notable advancements, primarily focus on trans- ferring pattern characteristics from LTS to STS without explicitly addressing their similarities at various levels. To overcome this lim- itation, we propose a novel few-shot time series forecasting model called DANet. Drawing on the Retrieval-Augmented Generation (RAG) framework in large language models, DANet retrieves long and short sequences from LTS that closely resemble STS, thereby enhancing prediction accuracy while simultaneously reducing un- certainty through this retrieval process. First, we define two metrics to quantify pattern similarities between STS and LTS, addressing the issue of different representations of the same pattern due to variations in sequence length. Second, we propose a dual-attention mechanism which embeds the two similarities metrics to extract and integrate long and short sequences from LTS across variable and temporal levels for generating predictions. Our experiments across six scenarios show that DANet significantly outperforms six state-of-the-art (SOTA) methods.
FedFMD: Fairness-Driven Adaptive Aggregation in Federated Learning via Mahalanobis Distance
Federated learning (FL) facilitates collaborative global model training without compromising data privacy. However, data distribution variations among clients inevitably introduce bias in global updates, impacting model fairness and performance. Existing methods assign client aggregation weights simply based on dataset size proportions or rely on substantial assumptions about specific global data distributions such as uniform label distributions. These approaches inadequately capture the intrinsic impact of Non-IID data characteristics on model divergence. To address these deficiencies, we propose a novel adaptive weight allocation algorithm, FedFMD, leveraging Mahalanobis distance, integrating Task Arithmetic, to dynamically assign weights based on client contributions. FedFMD explicitly models task-centric deviations caused by data heterogeneity without requiring raw data access or prior distribution assumptions. Besides, FedFMD enhances aggregation weights computation through time-decay adjustments, guided by historical client performance trends, optimizing both fairness and utility. Extensive evaluations against six state-of-the-art (SOTA) algorithms and two distance metrics across three datasets demonstrate the superior performance of FedFMD in fairness and utility.
CHEM: Causally and Hierarchically Explaining Molecules
Graph Neural Networks (GNNs) have significantly advanced in analyzing graph-structured data; however, their explainability remains challenging, affecting their applicability in critical domains such as medicine and pharmacology. In particular, violating the subgraph structure can degrade model interpretability and generalization performance. To address this problem, we propose a hierarchical and explainable causal inference-based GNN. Our model selects features based on explainable subgraph units informed by prior knowledge. Our method begins by clustering molecules into functional groups via the BRICS algorithm, then constructing a hierarchical structure at both the node and motif levels. The proposed model employs a gate module that distills causal features on the motif level and a loss function that disconnects information flow from non-causal features to the target level. The classification results on real-world molecular graphs demonstrate that our model outperforms other causal inference-based GNN models. In addition, it is confirmed that leveraging molecular docking data effectively identifies true causal substructures in the proposed model.
ST-Hyper: Learning High-Order Dependencies Across Multiple Spatial-Temporal Scales for Multivariate Time Series Forecasting
In multivariate time series (MTS) forecasting, many deep learning based methods have been proposed for modeling dependencies at multiple spatial (inter-variate) or temporal (intra-variate) scales. However, existing methods may fail to model dependencies across multiple spatial-temporal scales (ST-scales, i.e., scales that jointly consider spatial and temporal scopes). In this work, we propose ST-Hyper to model the high-order dependencies across multiple ST-scales through adaptive hypergraph modeling. Specifically, we introduce a Spatial-Temporal Pyramid Modeling (STPM) module to extract features at multiple ST-scales. Furthermore, we introduce an Adaptive Hypergraph Modeling (AHM) module that learns a sparse hypergraph to capture robust high-order dependencies among features. In addition, we interact with these features through tri-phase hypergraph propagation, which can comprehensively capture multi-scale spatial-temporal dynamics. Experimental results on six real-world MTS datasets demonstrate that ST-Hyper achieves the state-of-the-art performance, outperforming the best baselines with an average MAE reduction of 3.8% and 6.8% for long-term and short-term forecasting, respectively. Code is available at https://anonymous.4open.science/ST-Hyper-83E7.
MillGNN: Learning Multi-Scale Lead-Lag Dependencies for Multi-Variate Time Series Forecasting
Multi-variate time series (MTS) forecasting is crucial for various applications. Existing methods have shown promising results owing to their strong ability to capture intra- and inter-variate dependencies. However, these methods often overlook lead-lag dependencies at multiple grouping scales, failing to capture hierarchical lead-lag effects in complex systems. To this end, we propose MillGNN, a novel graph neural network-based method that learns multiple grouping scale lead-lag dependencies for MTS forecasting, which can comprehensively capture lead-lag effects considering variate-wise and group-wise dynamics and decays. Specifically, MillGNN introduces two key innovations: (1) a scale-specific lead-lag graph learning module that integrates cross-correlation coefficients and dynamic decaying features derived from real-time inputs and time lags to learn lead-lag dependencies for each scale, which can model evolving lead-lag dependencies with statistical interpretability and data-driven flexibility; (2) a hierarchical lead-lag message passing module that passes lead-lag messages at multiple grouping scales in a structured way to simultaneously propagate intra- and inter-scale lead-lag effects, which can capture multi-scale lead-lag effects with a balance of comprehensiveness and efficiency. Experimental results on 11 datasets demonstrate the superiority of MillGNN for long-term and short-term MTS forecasting, compared with 16 state-of-the-art methods.
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning
While modern recommender systems are instrumental in navigating information abundance, they remain fundamentally limited by static user modeling and reactive decision-making paradigms. Current large language model (LLM)-based agents inherit these shortcomings through their overreliance on heuristic pattern matching, yielding recommendations prone to shallow correlation bias, limited causal inference, and brittleness in sparse-data scenarios. We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities. Each user is modeled as an agent with parallel cognitions: fast response for immediate interactions and slow reasoning that performs chain-of-thought rationales. To cultivate intrinsic slow thinking, we develop anchored reinforcement training-a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping. This hybrid approach scaffolds agents in acquiring foundational capabilities (preference summarization, rationale generation) while enabling dynamic policy adaptation through simulated feedback loops. Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines, despite using only 0.4% of the full training data.
Spatio-Temporal Forecasting under Open-World Missingness with Adaptive Mixture-of-Experts
Spatio-temporal forecasting is crucial for sustainable urban development and societal decision-making. However, real-world spatio-temporal data often exhibit open-world missingness: missing rates and patterns evolve dynamically across time and space, severely disrupting dependencies and challenging accurate forecasting. Traditional methods universally overlook the dynamic nature of missingness, resulting in degraded predictive accuracy. To address this gap, we propose a novel Spatio-Temporal Missing-aware Mixture-of-Experts (STMMoE) architecture, equipped with a three-stage training strategy. STMMoE dynamically adapts to varying missing rates through a gating mechanism that selects specialized expert branches. The three-stage training strategy improves end-to-end forecasting performance by aligning the representations of complete and missing data. Extensive experiments on two real-world datasets show that our method achieves state-of-the-art performance.
Gravity-GNN: Deep Reinforcement Learning Guided Space Gravity-based Graph Neural Network
Graph Neural Networks (GNNs) have demonstrated remarkable capabilities in handling graph data. Typically, GNNs recursively aggregate node information, including node features and local topological information, through a message-passing scheme. However, most existing GNNs are highly sensitive to neighborhood aggregation, and irrelevant information in the graph topology can lead to inefficient or even invalid node embeddings. To overcome these challenges, we propose a novel Space Gravity-based Graph Neural Network (Gravity-GNN) guided by Deep Reinforcement Learning (DRL). In particular, we introduce a novel similarity measure called ''node gravity'', inspired by the gravitational force between particles in space, to compare nodes within graph data. Furthermore, we employ DRL technology to learn and select the most suitable number of adjacent nodes for each node. Our experimental results on various real-world datasets demonstrate that Gravity-GNN outperforms state-of-the-art methods regarding node classification accuracy, while exhibiting greater robustness against disturbances.
TLCCSP: A Scalable Framework for Enhancing Time Series Forecasting with Time-Lagged Cross-Correlations
Time series forecasting is critical across various domains, such as weather, finance and real estate forecasting, as accurate forecasts support informed decision-making and risk mitigation. While recent deep learning models have improved predictive capabilities, they often overlook time-lagged cross-correlations between related sequences, which are crucial for capturing complex temporal relationships. To address this, we propose the Time-Lagged Cross-Correlations-based Sequence Prediction framework (TLCCSP), which enhances forecasting accuracy by effectively integrating time-lagged cross-correlated sequences. TLCCSP employs the Sequence Shifted Dynamic Time Warping (SSDTW) algorithm to capture lagged correlations and a contrastive learning-based encoder to efficiently approximate SSDTW distances. Experimental results on weather, finance and real estate time series datasets demonstrate the effectiveness of our framework. On the weather dataset, SSDTW reduces mean squared error (MSE) by 16.01% compared with single-sequence methods, while the contrastive learning encoder (CLE) further decreases MSE by 17.88%. On the stock dataset, SSDTW achieves a 9.95% MSE reduction, and CLE reduces it by 6.13%. For the real estate dataset, SSDTW and CLE reduce MSE by 21.29% and 8.62%, respectively. Additionally, the contrastive learning approach decreases SSDTW computational time by approximately 99%, ensuring scalability and real-time applicability across multiple time series forecasting tasks.
Beyond Return Conditioning: Multi-Scale Sequence Modeling and Advantage-Guided Policy Routing for Offline RL
Return-conditioned supervised learning (RCSL) in offline reinforcement learning (RL) leverages Transformers to extract behavioral patterns from offline datasets for decision-making. However, it suffers from inherent limitations in comprehensively capturing multi-scale temporal relationships in historical trajectories. Moreover, its return-conditioning mechanism offers limited guidance in exploiting high-quality behavioral patterns, often resulting in suboptimal action generation during inference. To address these challenges, we propose the Advantage Decision ConvMamba (ADCM), a method that integrates multi-scale sequence modeling (MSSM) with advantage policy guidance (APG). ADCM reconstructs historical sequences through patch partitioning and employs Mamba architecture together with causal convolutions to model sparse global dependencies and dense local Markovian dependencies for behavioral pattern discovery. By incorporating relative advantage action sampling based on the Mixture-of-Experts (MoE) framework, ADCM prioritizes high-quality actions during inference, thereby reducing reliance on low-quality behavioral patterns in the dataset. We evaluate ADCM on multiple offline RL benchmarks from D4RL. Experimental results show that ADCM achieves significant improvements over baseline models, with particularly strong performance on suboptimal datasets. The code for ADCM is available at https://github.com/iTom233/ADCM.git.
Multi-Resource-Aware Admission Control for Online Data Processing
Online data processing platforms offer free-tier services to attract users, where the service provider must strategically utilize limited resources to handle large amounts of low-value requests. Admission control has been leveraged to choose requests to process or skip, yet request uncertainties like unknown rewards and request numbers, and complicated interdependence among multi-resource consumption further pose challenges. To tackle these challenges, we propose a novel admission control solution, the Online Multi-resource Magician's Admission (OMMA) algorithm, that balances resource consumption with reward accumulation across online requests, while coordinating the intertwined consumption of different resources. OMMA is an online algorithm, with its performance evaluated by competitive ratio. OMMA attains a competitive ratio of C(1-K-1/2)M(1-L-1/2)N, where C is the constant capturing the intrinsic limitation of resource availability, K is the individual resource budget, M is the number of individual resources, L is the joint resource budget, and N is the number of joint resources. The competitive ratio achieved by OMMA is tight, meaning that it can be achieved by OMMA in certain problem instances. We implement trace-driven experiments to evaluate the practical performance of OMMA on a real-world LLM prompt dataset, demonstrating the superior performance of OMMA in online data processing services.
Asking Questions with Thoughts: An Efficient Difficulty-Controllable Question Generation Method with Posterior Knowledge Distillation
Difficulty Controllable Question Generation (DCQG) for reading comprehension learns to generate questions for measuring the reading abilities of examinees, playing a crucial role in educational scenarios. This work studies answer-aware DCQG, a challenging task that requires the generated questions to remain faithful to the assigned answer and match the desired difficulty level at the same time. To this end, we first propose an effective two-stage framework, Asking Questions with Thoughts (AQT), to guide a backbone large language model (LLM) to generate questions that are both faithful and difficulty-aware through conducting in-depth self-thinking. Then, we introduce a novel Posterior Knowledge Distillation (PKD) to efficiently fine-tune AQT by distilling knowledge from posterior inference. Finally, to address the scarcity of DCQG datasets, we use an efficient LLM Pretest-based Difficulty Estimation (LP-DE) to automatically construct DCQG datasets from common QG/QA datasets. Extensive experiments prove that our methods have promising results in terms of both faithfulness and difficulty awareness.
Empowering Denoising Sequential Recommendation with Large Language Model Embeddings
Sequential recommendation aims to capture user preferences by modeling sequential patterns in user-item interactions. However, these models are often influenced by noise such as accidental interactions, leading to suboptimal performance. Therefore, to reduce the effect of noise, some works propose explicitly identifying and removing noisy items. However, we find that simply relying on collaborative information may result in an over-denoising problem, especially for cold items. To overcome these limitations, we propose a novel framework: Interest Alignment for Denoising Sequential Recommendation (IADSR) which integrates both collaborative and semantic information. Specifically, IADSR is comprised of two stages: in the first stage, we obtain the collaborative and semantic embeddings of each item from a traditional sequential recommendation model and an LLM, respectively. In the second stage, we align the collaborative and semantic embeddings and then identify noise in the interaction sequence based on long-term and short-term interests captured in the collaborative and semantic modalities. Our extensive experiments on four public datasets validate the effectiveness of the proposed framework and its compatibility with different sequential recommendation systems. The code and data are released for reproducibility: https://github.com/Applied-Machine-Learning-Lab/IADSR.
Learning Conditional Probability Distributions for Robust Probabilistic Inference in Bayesian Network
Bayesian Network (BN) has been widely employed for many applications like medical diagnosis due to its ability to deal with probabilistic inferences. Real-world inference tasks in BN cannot be robustly processed by classic search-based inference algorithms, since the conditional probabilities w.r.t. given arbitrary evidence values may be missing (i.e., not included) in the conditional probability tables (CPTs). Most of the existing methods, relying on imputation models, density estimation models or deep neural networks, cannot accurately learn these missing probabilities. To this end, we incorporate the idea of learning and search for robust probabilistic inferences in BN. Firstly, we decompose the probabilistic inference task into missing and existing probability factors, ensuring the consistency of their probability spaces. Secondly, we define the Wasserstein distance between missing and existing probability factors, and incorporate the idea of generative adversarial network to obtain missing probability factors with the minimal Wasserstein distance. Finally, we give the algorithm for robust probabilistic inferences with arbitrary evidence values, which could also be used to deal with the probabilistic inferences with arbitrary query values. Extensive experiments on synthetic and real-world datasets are conducted to demonstrate the superiority of our proposed method.
Adaptive Context-Infused Performance Evaluator for Iterative Feature Space Optimization
Iterative feature space optimization includes continuously evaluating and refining the feature space to improve downstream task performance. However, existing methods commonly suffer from three major limitations: 1) ignoring differences between samples leads to evaluation bias; 2) the feature space is overly tailored to specific models, resulting in overfitting and poor generalization; and 3) retraining the evaluator from scratch in each iteration significantly reduces overall efficiency. To bridge these gaps, we introduce EASE (gEneralized Adaptive feature Space Evaluator), a generalized framework for efficient and objective evaluation of iteratively generated feature spaces. This framework includes two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to mitigate evaluation bias by decoupling the information distribution within the feature space. To achieve this, based on feedback from the subsequent evaluator, we identify the samples most challenging for evaluation and the features most relevant to prediction tasks. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. Specifically, we propose a weighted-sharing multi-head attention mechanism to encode the feature space into an embedding vector for evaluation, and update the evaluator incrementally to retain prior knowledge while incorporating new information. Extensive experiments on fifteen public datasets demonstrate the effectiveness of EASE. We have released our code and data to the public.
Robust Multi-Label Learning with Instance-Dependent Label Noise
Multi-label learning focuses on tasks where each instance is associated with multiple labels. Due to the high cost of obtaining accurate annotations, real-world multi-label datasets often contain noisy labels from crowdsourcing or automated annotation. Noisy multi-label learning has hence been studied to address the label noise problem. However, existing noisy multi-label methods struggle to handle instance-dependent noise (IDN), which is a complex and common type of label noise in practical applications, severely affecting the reliability of multi-label learning. In contrast, existing methods designed for single-label IDN cannot be directly applied to multi-label data. To address these challenges, we propose Robust multi-label learning with Instance-Dependent label noisE (RIDE), a framework for multi-label learning with IDN. Specifically, RIDE first decomposes the observed label matrix into clean and noisy components via a joint low-rank and sparse decomposition. Secondly, a linear sparse mapping from feature space to label space is introduced to explicitly model how instance features induce IDN. Thirdly, to further improve the denoising accuracy, RIDE estimates a noise suppression coefficient for each sample, which weights the sparse regularization term of the noise decomposition. In addition, theoretical analysis is provided to derive upper bounds on the noise estimation and generalization errors. Extensive experiments on benchmark multi-label datasets with varying noise rates show that RIDE outperforms state-of-the-art methods. The code is available at https://github.com/View5U/RIDE.
FedGC: Contrastive-enhanced Subgraph Federated Learning with Grouping Pseudo-Label
Graph structures are widely used to model relational data. In many real-world applications, each data client usually holds only a partial subgraph of the original graph, and privacy concerns limit data exchange between clients. This decentralized data distribution often degrades the effectiveness of conventional Graph Neural Networks (GNNs). Recently, subgraph federated learning has been proposed to enable subgraph data collaborative training without compromising privacy. However, two critical challenges remain: (1) Missing links between subgraphs from different clients significantly prevent the message-passing process in GNNs. (2) Varying data distributions across subgraphs lead to the non-IID issue (e.g., node label skews), which require personalization of local clients in subgraph federated learning scenarios. To address these challenges, we propose FedGC, a novel subgraph federated method that combines pseudo label-based client grouping with Local-Global Contrastive Tasks. Specifically, FedGC initializes a random graph on the server, leverages predicted pseudo label distributions to group clients, and assigns aggregation weights based on the similarity of these distributions. Additionally, FedGC incorporates Local-Global Contrastive tasks into the local client learning process to achieve personalized client parameter update. By adjusting the balance between local supervision task and contrastive task, FedGC enables each client to effectively control the balance of local and global information. Extensive experiments on six real-world datasets that cover citation networks and social networks validate the superior performance of FedGC compared to state-of-the-art baselines.
GraphIAM: Two-Stage Algorithm for Improving Class-Imbalanced Node Classification on Attribute-Missing Graphs
Addressing class-imbalanced graphs is a challenging task due to the involvement of both node attributes and graph structures. Existing works on class-imbalanced graphs simply assume that all node attributes are available. However, in real-world graphs, many nodes may lack attributes due to privacy issues or missing data, making class-imbalanced graph learning more challenging. In this paper, we propose GraphIAM, a novel two-stage algorithm for improving class-imbalanced node classification on attribute-missing graphs. In the pre-training phase, GraphIAM adopts graph contrastive learning with oversampling to tackle both attribute-missing and class-imbalanced issues. During fine-tuning, an adapter mechanism is introduced to learn node representations, alleviating the generalization gap between pre-training and downstream tasks. Experimental results on benchmark datasets demonstrate that our method achieves state-of-the-art performance, outperforming class-imbalanced graph learning approaches by 5% in F Score on graphs with severe attribute missingness.
Dual-Space Masked Reconstruction for Robust Self-Supervised Human Activity Recognition
Human Activity Recognition (HAR) based on wearable sensors faces critical challenges, including limited labeled data, distribution shifts, and sensitivity to sensor noise. To address these issues, this paper proposes a novel self-supervised learning (SSL) framework that leverages dual-space masked reconstruction to learn robust and generalizable representations for sensor-based HAR. Specifically, we design a brand-new pretext task in the pre-training stage, which learns representations by reconstructing the original signals with partial masks in both the original space and the latent space. For the reconstruction task in the original space, we adopt an approach similar to Masked Autoencoders (MAE). In the latent space, we introduce a Spectral block to extract more discriminative representations from the time-domain and frequency-domain information. Then, we implement feature-level reconstruction using a Mean Teacher Network. The feature extractor trained through this pretext task is subsequently utilized for downstream challenging HAR tasks. Experiments on four public datasets (MotionSense, UCI-HAR, PAMAP2 and RealWorld) demonstrate the framework's superiority over state-of-the-art SSL methods, achieving an average 2.45% improvement in Macro F1-score under fine-tuning protocols. This work achieves excellent activity recognition performance in labeled data scarcity and Non- IID data scenarios that are closer to the display world by unifying local and global signal interaction. It provides a scalable solution for anti noise activity recognition in heterogeneous environments, thereby advancing HAR.
Frequency-Domain Disentanglement-Fusion and Dual Contrastive Learning for Sequential Recommendation
Sequential recommendation(SR) aims to provide personalized recommendations by capturing behavioral intents from existing user interaction sequences. Most previous studies are based on attention mechanisms; however, these approaches suffer from inherent over-smoothing issues that limit their ability to capture transient behavioral signals reflecting the user's immediate intents in interaction sequences. Recently, frequency-domain analysis methods based on the Fourier transform have garnered significant attention in the sequential recommendation domain. By applying the Fourier transform, interaction sequences can be mapped to the frequency domain, enabling direct analysis and targeted manipulation of distinct frequency components. In addition to the inherent limitations of self-attention mechanisms, sequential recommendation faces persistent challenges such as data sparsity and noise. To address these issues, we propose Frequency-Domain Disentanglement-Fusion and Dual Contrastive Learning for Sequential Recommendation (FDCLRec). FDCLRec replaces self-attention mechanisms with a frequency-domain adaptive filtering module, which decouples sequence patterns into distinct high-/low-frequency components and synthesizes comprehensive sequence representations through adaptively weighted fusion. In addition, two auxiliary contrastive learning tasks(augmented-view contrasting and same-target sequence contrastive learning) are strategically integrated to alleviate data sparsity and interaction noise. Extensive experiments on four real-world datasets demonstrate that our model outperforms baseline methods.
Efficient Knowledge Graph Unlearning with Zeroth-order Information
Due to regulations like the Right to be Forgotten, there is growing demand for removing training data and its influence from models. Since full retraining is costly, various machine unlearning methods have been proposed. In this paper, we firstly present an efficient knowledge graph (KG) unlearning algorithm. We remark that KG unlearning is nontrivial due to the distinctive structure of KG and the semantic relations between entities. Also, unlearning by estimating the influence of removed components incurs significant computational overhead when applied to large-scale knowledge graphs. To this end, we define an influence function for KG unlearning and propose to approximate the model's sensitivity without expensive computation of first-order and second-order derivatives for parameter updates. Specifically, we use Taylor expansion to estimate the parameter changes caused by data removal. Given that the first-order gradients and second-order derivatives dominate the computational load, we use the Fisher matrices and zeroth-order optimization to approximate the inverse-Hessian vector product without constructing the computational graphs. Our experimental results demonstrate that the proposed method outperforms other state-of-the-art graph unlearning baselines significantly in terms of unlearning efficiency and unlearning quality. Our code is released at https://github.com/NKUShaw/ZOWFKGIF.
A Robust and High-Efficiency Active Clustering Framework with Multi-User Collaboration
Active constraint-based clustering enhances semi-supervised clustering through a machine-led interaction process. This approach dynamically selects the most informative constraints to query, minimizing the number of human annotations required. Existing methods face three key challenges in real-world applications: scalability, timeliness, and robustness against user annotation errors. In this work, we propose a robust and high-efficiency Active Clustering framework with Multi-user Collaboration (ACMC). ACMC constructs a diffusion tree using the nearest-neighbor technique and employs a multi-user online collaboration framework to iteratively refine clustering results. In each iteration: (a) nodes with high uncertainty and representativeness are selected in batch; (b) well-designed multi-user asynchronous query categorizes selected nodes using neighborhood sets, reducing individual workloads and improving overall timeliness; (c) user-provided constraints and newly discovered categories are synchronized, with user confidences dynamically updated to enhance robustness against erroneous annotations; (d) categorized nodes, stored in neighborhood sets, serve as sources in the diffusion tree to refine the clusters. Experimental results demonstrate that ACMC outperforms baseline methods in terms of clustering quality, scalability, and robustness against user annotation errors.
Federated Approximate Query Processing Based on Deep Models
Data isolation poses a significant challenge to efficient big data query processing, as data providers are often reluctant to share their raw data due to security concerns. Current federated query systems address this issue by employing Secure Multi-Party Computation (SMC) and Differential Privacy (DP) to facilitate secure and collaborative computation. However, these privacy-preserving methods rely on cryptographic protocols, which introduce substantial computational overhead, slowing query processing by up to 1,000 times compared to plaintext queries. While sampling methods have been explored to enhance federated query systems, they frequently fail to strike a balance between accuracy and speed. To address the limitations above, we propose a secure federated approximate query system based on a deep classifier (SAQDC). This system utilizes deep learning techniques to accelerate query processing while integrating SMC and Differential Privacy to achieve an optimal balance between privacy and efficiency by allowing each data provider to train classifiers using Multi-Layer Perceptron (MLP) and Deep Set architectures, which predict query relative errors across different modules.Based on the prediction errors generated by the classifier, queries are assigned to the most appropriate approximate query model and the differential privacy parameters are adjusted to enhance query accuracy. This approach enhances query speed, preserves accuracy, and effectively mitigates malicious differential privacy attacks. We demonstrate SAQDC's superior performance through extensive experiments on three large-scale datasets.
Lead-LagNet: Exploiting Lead-Lag Dependencies for Cross-Series Temporal Prediction
In many real-world systems, the evolution of one time series often leads or lags that of its related peers rather than moving in perfect synchrony. Graph Neural Networks (GNNs) are widely used to model such inter-Series Lead-Lag relationships, representing entities as graph nodes with time-series attributes. However, existing methods typically collapse temporal information into discrete points and adopt uniform messaging mechanisms assuming synchronized upward/downward effects and identical time lags among related peers, which are often inconsistent with real-world dynamics. Furthermore, stacking GNN layers to capture multi-hop influences reduces interpretability, hindering understanding of underlying dynamics. To address these issues, we propose the Lead-LagNet, a framework designed to capture diverse cross-series propagation patterns with lead-lag phenomenon in time series. The Lead-LagNet identifies meaningful subsequences in time series and employs a gating mechanism to establish lead-lag connections, enabling the model to uncover complex influencing patterns without relying on predefined relationships. By decoupling the linear messaging process from non-linear feature extraction, the proposed Lead-LagNet enhances both modeling flexibility and interoperability. Experimental evaluation of both synthetic tasks and real-world datasets demonstrates the superiority of Lead-LagNet over state-of-the-art algorithms, including BiGRU, SFM, TGC, FinGAT and ADGAT. Our code and data are available at https://github.com/FICLAB/LeadLagNet.
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
Specialized entity linking (EL) models are well-trained at mapping mentions to unique knowledge base (KB) entities according to a given context. However, specialized EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, extensively pre-trained large language models (LLMs) possess broader knowledge of uncommon entities. Yet, with a lack of specialized EL training, LLMs frequently fail to generate accurate KB entity names, limiting their standalone effectiveness in EL. With the observation that LLMs are more adept at context generation instead of EL execution, we introduce LLM-Augmented Entity Linking (LLMAEL), the first framework to enhance specialized EL models with LLM data augmentation. LLMAEL leverages off-the-shelf, tuning-free LLMs as context augmenters, generating entity descriptions to serve as additional input for specialized EL models. Experiments show that LLMAEL sets new state-of-the-art results across 6 widely adopted EL benchmarks: compared to prior methods that integrate tuning-free LLMs into EL, LLMAEL achieves an absolute 8.9% gain in EL accuracy. We release our code and datasets.
Multi-Turn Interactions for Text-to-SQL with Large Language Models
This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods are still inefficient and struggle to handle cases with wide tables effectively. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail to provide a universally applicable interaction design. To address these challenges, we introduce Interactive-T2S, a framework that generates SQL queries through direct interactions with databases. This framework includes four general tools that facilitate proactive and efficient information retrieval by the LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise reasoning processes within our framework. Our approach achieves advanced performance on the Spider and BIRD datasets as well as their variants. Notably, we obtain state-of-the-art results on the BIRD leaderboard under the setting without oracle knowledge, demonstrating the effectiveness of our method. Code and data are available at: https://github.com/JimXiongGM/Interactive-Text-to-SQL.
Interactive Text-to-Visualization: Refining Visualization Outputs Through Natural Language User Feedback
Data visualization (DV) is of significance to the data analysis applications, exploring the hidden patterns and showing the insightful data. The task of Text-to-Vis, which takes text as input and generates data visualizations, was proposed to lower the threshold for generating DVs. However, the existing methods only view this problem as a one-shot mapping problem, directly outputting the final DVs without considering any user feedback for refining the generated DVs. Motivated by this, a more interactive scenario is investigated, where users could provide natural language feedback to refine the generated DVs. The scenario is formulated as the Text-to-Vis with Feedback problem. A new dataset is also created to further study the problem, which contains the user utterance, database schema, generated DVs, the natural language feedback and the refined DVs. A large language model (LLM) based framework named Vis-Edit is designed for handling this task, including schema linking, clause location, clause generation, merger and self-consistency. Eventually, extensive experiments reveal the effectiveness of Vis-Edit.
ESED: Emotion-Specific Evidence Decomposition for Uncertainty-Aware Multimodal Emotion Recognition in Conversation
Multimodal emotion recognition in conversations is inherently challenging due to ambiguous cues, modality conflicts, and temporal dynamics, all of which contribute to complex and diverse uncertainty sources. While some recent methods incorporate uncertainty modeling, they often focus on overall prediction confidence, without explicitly distinguishing the different sources of uncertainty introduced by underlying factors. To address these challenges, we propose a novel Emotion-Specific Evidence Decomposition framework (ESED) that leverages evidential deep learning to explicitly model and disentangle multimodal emotional uncertainty. Rather than directly fusing features, ESED decomposes each modality's evidence into three interpretable components: (1) emotion-consistent evidence, capturing shared emotional cues across modalities; (2) emotion-specific evidence, highlighting the unique emotional role of each modality; and (3) dynamic evidence, modeling utterance-level temporal variations. These components are adaptively weighted based on emotional intensity, ambiguity, and dynamicity, quantified via prediction entropy, inter-modal divergence, and temporal variance. The final prediction is obtained through an adaptive fusion of these weighted components. Extensive experiments demonstrate that ESED outperforms the state-of-the-art methods on the MELD and IEMOCAP datasets, demonstrating the effectiveness of our proposed method.
Reinforcement Learning-Driven Generative Retrieval with Semantic-aligned Multi-Layer Identifiers
Generative retrieval enhances retrieval effectiveness by generating natural language represented document identifiers. However, current methods often struggle with two major challenges: limited identifier quality and insufficient query-document interaction, leading to limited retrieval performance. To tackle these challenges, we propose a novel generative retrieval framework integrated with semantic-aligned multi-layer identifiers and reinforcement learning. To improve identifier quality, we design a prompt-driven multi-task learning strategy to generate three types of hierarchical identifiers: summary, keyword, and pseudo-query, to capture multi-granularity document semantics. Furthermore, we adopt supervised fine-tuning to integrate these identifiers. To improve query-document interaction, we devise a multi-view ranking fusion mechanism that combines retrieval results across multi-layer identifiers. We further employ a GRPO-based reinforcement learning based on dense similarity rewards and a difficulty-aware negative sampling strategy to optimize the generated identifiers. Experiments on multiple benchmark datasets show that our framework significantly outperforms existing generative retrieval methods, offering a promising solution for building more effective and semantically aligned retrieval systems. The code for our model is publicly available at https://github.com/yicentian02/GRAM-RL.
The Structure of Cross-National Collaboration in Open-Source Software Development
Open-source software (OSS) development platforms, such as GitHub, expand the potential for cross-national collaboration among developers by lowering the geographic, temporal, and coordination barriers that limited software innovation in the past. However, research has shown that the technological affordances that facilitate cross-national collaboration do not uniformly benefit all countries. Using the GitHub Innovation Graph dataset, which aggregates the complete cross-country collaborations among the entire population of GitHub developers, we present quantitative evidence of deep-seated religious and cultural affinities, shared colonial histories, and geopolitical factors structuring the collaborations between non-U.S. country pairs that become visible when the overarching dominance of the U.S. is removed from the data. This study highlights the opportunities to develop decentralizing strategies to facilitate new collaborations between developers in non-U.S. countries, thereby fostering the development of novel, innovative solutions. More generally, this study also underscores the importance of contextualizing user behavior and knowledge management in information systems with long-term, macro-social conditions in which these systems are inextricably embedded.
CLAP: Coreference-Linked Augmentation for Passage Retrieval
Large Language Model (LLM)-based passage expansion has shown promise for enhancing first-stage retrieval, but often underperforms with dense retrievers due to semantic drift and misalignment with their pretrained semantic space. Beyond this, only a portion of a passage is typically relevant to a query, while the rest introduces noise-an issue compounded by chunking techniques that break coreferential continuity. We propose Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent chunks, resolves coreference chains, and generates localized pseudo-queries aligned with dense retriever representations. A simple fusion of global topical signals and fine-grained subtopic signals achieves robust performance across domains. CLAP yields consistent gains even as retriever strength increases, enabling first-stage retrieval to match or surpass second-stage rerankers such as BM25 + MonoT5-3B-exceeding the reranker by up to +20.68 nDCG@10 on ArguAna. These improvements are especially notable in out-of-domain settings, where conventional LLM-based expansion methods relying on domain knowledge often falter. CLAP instead adopts a logic-centric pipeline that enables robust, domain-agnostic generalization.
LiveVal: Real-time and Trajectory-based Data Valuation via Adaptive Reference Points
Data valuation quantifies the contribution of each training data, enabling harmful data detection and enhancing model robustness. However, existing methods are typically post-hoc and require fully trained models, making them computationally expensive and unable to detect harmful data early in training. We propose LiveVal, a real-time and trajectory-based data valuation method that assesses training data by analyzing their influence on the optimization trajectory. LiveVal includes three key innovations: 1) a real-time valuation framework with minimal overhead, seamlessly integrated into standard training processes; 2) an adaptive reference point mechanism that assesses data impact on generalization; and 3) a normalization technique that ensures fair comparisons across training stages. Theoretical analysis shows that LiveVal achieves directional alignment, boundedness, stability, and fairness. Experiments demonstrate that LiveVal achieves up to 180× speedup over baseline methods while maintaining robust performance across diverse models and datasets.
KV-Auditor: Auditing Local Differential Privacy for Correlated Key-Value Estimation
To protect privacy for data-collection-based services, local differential privacy (LDP) is widely adopted due to its rigorous theoretical bound on privacy loss. However, mistakes in complex theoretical analysis or subtle implementation errors may undermine its practical guarantee. To address this, auditing is crucial to confirm that LDP protocols truly protect user data. However, existing auditing methods, though, mainly target machine learning and federated learning tasks based on centralized differentially privacy (DP), with limited attention to LDP. Moreover, the few studies on LDP auditing focus solely on simple frequency estimation task for discrete data, leaving correlated key-value data - which requires both discrete frequency estimation for keys and continuous mean estimation for values - unexplored. To bridge this gap, we propose KV-Auditor, a framework for auditing LDP-based key-value estimation mechanisms by estimating their empirical privacy lower bounds. Rather than traditional LDP auditing methods that relies on binary output predictions, KV-Auditor estimates this lower bound by analyzing unbounded output distributions, supporting continuous data. Specifically, we classify state-of-the-art LDP key-value mechanisms into interactive and non-interactive types. For non-interactive mechanisms, we propose horizontal KV-Auditor for small domains with sufficient samples and vertical KV-Auditor for large domains with limited samples. For interactive mechanisms, we design a segmentation strategy to capture incremental privacy leakage across iterations. Finally, we perform extensive experiments to validate the effectiveness of our approach, offering insights for optimizing LDP-based key-value estimators.
Improving Recommendation Fairness via Graph Structure and Representation Augmentation
Graph Convolutional Networks (GCNs) have become increasingly popular in recommendation systems. However, recent studies have shown that GCN-based models will cause sensitive information to disseminate widely in the graph structure, amplifying data bias and raising fairness concerns. While various fairness methods have been proposed, most of them neglect the impact of biased data on representation learning, which results in limited fairness improvement. Moreover, some studies have focused on constructing fair and balanced data distributions through data augmentation, but these methods significantly reduce utility due to disruption of user preferences. In this paper, we aim to design a fair recommendation method from the perspective of data augmentation to improve fairness while preserving recommendation utility. To achieve fairness-aware data augmentation with minimal disruption to user preferences, we propose two prior hypotheses. The first hypothesis identifies sensitive interactions by comparing outcomes of performance-oriented and fairness-aware recommendations, while the second one focuses on detecting sensitive features by analyzing feature similarities between biased and debiased representations. Then, we propose a dual data augmentation framework for fair recommendation, which includes two data augmentation strategies to generate fair augmented graphs and feature representations. Furthermore, we introduce a debiasing learning method that minimizes the dependence between the learned representations and sensitive information to eliminate bias. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed framework.
SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting
Time series forecasting has made significant advances, including with Transformer-based models. The attention mechanism in Transformer effectively captures temporal dependencies by attending to all past inputs simultaneously. However, its quadratic computational complexity with respect to sequence length limits the scalability for long-range modeling. Recent state space models (SSMs) such as Mamba offer a promising alternative by achieving linear complexity without attention. Yet, Mamba compresses historical information into a fixed-size latent state, potentially causing information loss and limiting representational effectiveness. This raises a key research question: Can we design a hybrid Mamba-Transformer architecture that is both effective and efficient for time series forecasting? To address it, we adapt a hybrid Mamba-Transformer architecture Mambaformer, originally proposed for language modeling, to the time series domain. Preliminary experiments reveal that naively stacking Mamba and Transformer layers in Mambaformer is suboptimal for time series forecasting, due to an information interference problem. To mitigate this issue, we introduce a new time series decomposition strategy that separates time series into long-range patterns and short-range variations. Then we show that Mamba excels at capturing long-term structures, while Transformer is more effective at modeling short-term dynamics. Building on this insight, we propose State Space Transformer (SST), a multi-scale hybrid model with expert modules: a Mamba expert for long-range patterns and a Transformer expert for short-term variations. To facilitate learning the patterns and variations, SST employs a multi-scale patching mechanism to adaptively adjust time series resolution: low resolution for long-term patterns and high resolution for short-term variations. Comprehensive experiments on real-world datasets demonstrate that SST achieves state-of-the-art performance while scaling linearly with sequence length (O(L)). The code is available on GitHub.
Contrastive Multi-View Graph Hashing
Multi-view graph data, which both captures node attributes and rich relational information from diverse sources, is becoming increasingly prevalent in various domains. The effective and efficient retrieval of such data is an important task. Although multi-view hashing techniques have offered a paradigm for fusing diverse information into compact binary codes, they typically assume attributes-based inputs per view. This makes them unsuitable for multi-view graph data, where effectively encoding and fusing complex topological information from multiple heterogeneous graph views to generate unified binary embeddings remains a significant challenge. In this work, we propose Contrastive Multi-view Graph Hashing (CMGHash), a novel end-to-end framework designed to learn unified and discriminative binary embeddings from multi-view graph data. CMGHash learns a consensus node representation space using a contrastive multi-view graph loss, which aims to pull k-nearest neighbors from all graphs closer while pushing away negative pairs, i.e., non-neighbor nodes. Moreover, we impose binarization constraints on this consensus space, enabling its conversion to a corresponding binary embedding space at minimal cost. Extensive experiments on several benchmark datasets demonstrate that CMGHash significantly outperforms existing approaches in terms of retrieval accuracy.
From Policy Comparison to Process Consistency and Beyond
Statistical Policy Comparison (SPC) assesses the equivalence of two stochastic policies (policy consistency) and has received broad attention. However, the SPC framework implicitly assumes the invariance of decision environments, and therefore fails to address a flurry of real-world data science applications. In this work, we refer to this overlooked issue as environment consistency, and together with policy consistency, this extends to a generalized concept process consistency for systematically comparing policy trials under the Markov decision process (MDP) framework. To address process consistency, we propose a unified comparison framework, extending beyond traditional statistical policy comparison studies by incorporating both policy and environment comparisons. For policy consistency, existing statistical policy comparison methods can be seamlessly integrated into our intentionally-designed framework without modification. Specifically for environment consistency (the focus of this work), we devise fine-grained return tests to capture shifts of key elements in MDPs; notably, under special cases where trajectory likelihood information is available or can be estimated, we introduce a trajectory test based on the likelihood ratio test (LRT), offering increased testing power. Extensive experiments demonstrate that our proposed testing methods achieve higher statistical power than existing approaches in testing process consistency, establishing their effectiveness across diverse real-world scenarios. Our code is available at https://github.com/bcxyf123/MDP-Testing.git.
LGC-CR: Few-shot Knowledge Graph Completion via Local Global Contrastive Learning and LLM-Guided Refinement
Recent years have witnessed increasing interest in few-shot knowledge graph completion (FKGC), which aims to infer novel query triples for few-shot relations from limited references. Despite promising progress, existing methods face two key challenges: (1) They often overlook rich higher-order neighbors, while traditional high-order aggregation methods are prone to introducing noise and lack effective alignment across multi-view neighborhood information. (2) Meta-learning methods over-rely on embeddings, making them susceptible to spurious relational patterns. Meanwhile, LLM-based methods, despite their potential, suffer from hallucinations and input constraints. To this end, we propose a novel framework that combines meta-learning, enhanced via a Local-Global Contrastive network, with LLM-guided Contextual Refinement (LGC-CR). At the data level, we design a local-global contrastive network to jointly aggregate relevant local features and capture stable global representations while filtering high-order noise, then align these two views through a dual contrast module to ensure consistency. At the model level, we employ an LLM refinement module, which retrieves relevant contexts to construct prompts and applies a knowledge selector to identify high-quality facts based on diversity and centrality, enabling efficient fine-tuning of LLMs to refine the preliminary predictions of meta-learning. The experimental results demonstrate that LGC-CR delivers better and more robust performance than state-of-the-art baselines, with Hit@1 improvements of 8.1%, 21.7%, and 20.6% on NELL, Wiki, and FB15K, respectively.
KALE: Knowledge Aggregation for Label-free Model Enhancement
Large foundation models have demonstrated remarkable success in natural language processing and computer vision. Applying the large models to downstream tasks often requires fine-tuning, in order to boost the predictive accuracy. However, the fine-tuning process relies heavily on labeled data and extensive training. This dependency makes fine-tuning impractical for niche applications, such as rare object detection or specialized medical tasks. To overcome these limitations, we propose KALE: Knowledge Aggregation for Label-free model Enhancement, a label-free method for model enhancement, leveraging knowledge aggregation via model fusion and adaptive representation alignment. Our method is powered by a carefully designed joint self-cooperative optimization function that considers (i) multi-granularity optimization (task-specific and layer-specific), (ii) self and cooperative supervision integration, and (iii) mitigation of error accumulation caused by entropy minimization. Additionally, we introduce a class cardinality-aware sample filtering to ensure the stability of the fusion process. We also design a lightweight representation alignment technique to refine the fusion coefficient in a few shots for quality enhancement. We evaluate our method on multiple image classification datasets using ViT-B/32 and ViT-L/14 backbones. Experimental results demonstrate that our label-free method consistently outperforms state-of-the-art unsupervised approaches, including TURTLE and supervised full fine-tuning, in terms of average performance. Specifically, compared to TURTLE, our method achieves average improvements of 20.7% with ViT-B/32 and 19.5% with ViT-L/14. Furthermore, on the challenging SUN397 dataset, our method surpasses supervised full fine-tuning by 4% and 2.3% with ViT-B/32 and ViT-L/14, respectively.
Fine-Grained Graph Rationalization
Rationale discovery is defined as finding a subset of the input data that maximally supports the prediction of downstream tasks. In the context of graph machine learning, graph rationale is defined as identifying the critical subgraph in the given graph topology. In contrast to the rationale subgraph, the remaining subgraph is named the environment subgraph. Graph rationalization can enhance the model performance because the mapping between the graph rationale and the prediction label is viewed as invariant, by definition. To ensure the discriminative power of the extracted rationale subgraphs, a key technique named intervention is applied, whose core idea is that given changing environment subgraphs, the semantics from the rationale subgraph is invariant, which guarantees the correct prediction result. However, most, if not all, of the existing graph rationalization methods develop their intervention strategies on the graph level, which is coarse-grained. In this paper, we propose FIne-grained Graph rationalization (FIG). Our idea is driven by the self-attention mechanism, which provides rich interactions between input nodes. Based on that, FIG can achieve node-level and virtual node-level intervention. Our experiments involve 7 real-world datasets, and the proposed FIG shows significant performance advantages compared to 13 baseline methods.
Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems
Recommender systems trained on implicit feedback data rely on negative sampling to distinguish positive items from negative items for each user. Since the majority of positive interactions come from a small group of active users, negative samplers are often impacted by data imbalance, leading them to choose more informative negatives for prominent users while providing less useful ones for users who are not so active. This leads to inactive users being further marginalised in the training process, thus receiving inferior recommendations. In this paper, we conduct a comprehensive empirical study demonstrating that state-of-the-art negative sampling strategies provide more accurate recommendations for active users than for inactive users. We also find that increasing the number of negative samples for each positive item improves the average performance, but the benefit is distributed unequally across user groups, with active users experiencing performance gain while inactive users suffering performance degradation. To address this, we propose a group-specific negative sampling strategy that assigns smaller negative ratios to inactive user groups and larger ratios to active groups. Experiments on eight negative samplers show that our approach improves user-side fairness and performance when compared to a uniform global ratio.
Compensating Information and Capturing Modal Preferences in Multimodal Recommendation: A Dual-Path Representation Learning Framework
In the context of information explosion, multimodal recommender systems (MMRS) have demonstrated great potential in capturing users' complex preferences and enhancing recommendation performance by integrating multimodal data such as images and text. However, multimodal data inherently suffers from semantic inconsistency, which can introduce information conflicts or noise. Moreover, users' reliance on different modalities varies dynamically with context and time (multimodal dynamic preferences). These challenges may lead to truth deviation and deep semantic mismatch, ultimately degrading recommendation performance. To address these issues, we propose Dual-Path Multimodal Recommendation (DPRec), a novel model that improves precision and robustness of the recommendation through cross-modal information compensation and dynamic modal preference learning. Specifically, DPRec first employs a cross-modal attention mechanism to dynamically model inter-modal correlations, effectively exploring complementary and shared features for robust user and item representations. Second, it integrates feature projection, modality alignment, and dynamic weighting mechanisms to adaptively adjust modality importance based on user context, ensuring flexibility in handling preference dynamics. Lastly, a modality contrastive loss is utilized to maximize mutual information between modalities, mitigating semantic mismatch by enhancing deep collaborative representations. Extensive experiments on three public datasets show that DPRec consistently outperforms state-of-the-art (SOTA) methods, achieving average improvements of 3.94% in Recall@20 and 3.84% in NDCG@20. Our code is publicly available at: https://anonymous.4open.science/r/DPRec-4D15.
FedSTEP: Asynchronous and Staleness-Aware Personalization for Efficient Federated Learning
Personalized Federated Learning (PFL) aims to provide client-specific models that adapt to local data distributions while leveraging shared knowledge across clients. A common design in PFL is the head-representation architecture, which combines a shared global representation with a local head on each client. Although effective, deploying this architecture in real-world systems remains challenging due to the presence of stragglers and the high communication cost. To address these issues, we propose FedSTEP, a unified framework that integrates asynchronous training with dynamic communication sparsification. Specifically, it adaptively adjusts each client's local training duration and communication sparsity based on staleness, enabling more efficient coordination between local adaptation and global representation. This design mitigates the impact of stragglers and ensures robust performance in heterogeneous environments. We provide a theoretical analysis of the convergence behavior and communication efficiency of FedSTEP under standard assumptions. Extensive experiments on five public datasets demonstrate that FedSTEP consistently outperforms existing methods. It achieves up to 4.65% higher accuracy, a 3.68× speedup in training, and a 1.91× reduction in communication cost.
Higher-order Structure and Semantics-enhanced User Profiling for Recommendation
Accurate user profiles are crucial for personalized recommendation systems to mitigate information overload on large-scale online platforms. While recent advances in large language models have enhanced semantic understanding for profile construction through textual artifacts, existing methods often neglect the higher-order structural patterns inherent in user-item interaction graphs-a key limitation for achieving accurate and diverse recommendations. In this paper, we propose SSPRec, a Higher-order Structure and Semantics-enhanced User Profiling for Recommendation. Specifically, we first introduce a multi-hop proximity matrix over item-item transitions, followed by low-rank approximation and clustering to group users based on behavioral similarity. Group-level user profiles are then distilled via representative keywords extracted from co-interacted items, and collaborative embeddings are concurrently learned from the interaction graph. To integrate collaborative signals with language-based profiles, we introduce a cross-view contrastive objective that encourages coherence between structural and semantic representations. Final recommendations are made using a fused user-item similarity score. Extensive experiments on four real-world datasets show that SSPRec not only outperforms baselines in accuracy (with 46.35% improvements), but also remains diverse and robust, even under incomplete interactions.
Ordinal Embedding for Collaborative Filtering: A Unified Regularization for Enhanced Generalization and Interpretability
Collaborative filtering is a primary paradigm of modern recommender systems. A typical practice is to embed collaborative signals into a latent space and infer recommendation scores based on the similarities between user and item embeddings. Besides inter-type similarities (i.e., user-item relationships), intra-type similarities (i.e., user-user, and item-item) are also essential as they capture the intrinsic structure of users and items. However, many existing recommendation models only learn inter-type similarities using objectives like ranking loss or binary classification loss, while neglecting intra-type similarities. Consequently, the intrinsic structures of users and items are often distorted in the latent space, where users with similar historical interactions diverge more than those dissimilar. In this study, we show the importance of preserving the ordinal relations of intra-type similarities. We provide a theoretical analysis suggesting that preserving intra-type similarity rankings can enhance a model's generalizability and interpretability. In addition, we propose a regularization that enforces a constraint on the rankings of intra-type similarities, ensuring that learning inter-type similarities does not break intrinsic ordinal structures. It can be seamlessly integrated into most latent factor models and can be jointly trained with their original objectives. Extensive experiments on 4 benchmark datasets and 5 representative models show that our ordinal regularization can consistently improve recommendation performance, and enhance the intra-type similarity coherence in the latent space. The results also exhibit enhanced generalizability and interpretability of recommendations.
Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants
Aligning Large Language Models (LLMs) with nuanced user instructions is critical for their effective deployment in real-world applications. While prior methods focus on enhancing data diversity and complexity, they often overlook models' sensitivity to fine-grained variations in semantically similar instructions. To address this, we introduce DeMoRecon, a data augmentation framework that decomposes complex instructions into sub-components, modifies individual elements, and reconstructs them into instruction variants. This method preserves contextual integrity while injecting targeted variability essential for fine-grained instruction-following. Based on DeMoRecon, we construct the FGIV dataset, comprising over 1,700 seed instructions and thousands of nuanced variants designed for both supervised fine-tuning and preference-based alignment. Experimental results show that LLMs trained with FGIV achieve up to +10.2% improvement on our fine-grained FGIV-Eval benchmark and up to +8.8% on existing benchmarks such as FollowBench and InfoBench. These findings highlight the value of FGIV in advancing instruction sensitivity and robustness in LLMs.
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markowitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution. Our code is available at https://github.com/gotobcn8/MMiC.
TCPN: Temporal Pyramidal Recurrent Network with Contrastive Learning for Temporal Knowledge Graph Reasoning
Temporal Knowledge Graphs (TKGs) serve as crucial tools for representing dynamic changes in the real world. Extrapolation reasoning within TKGs aims to predict entirely unknown future facts based on limited historical data, offering considerable practical value across various fields. However, existing methods generally focus on the recurrence and periodicity of historical facts, while overlooking the dynamic interactions associated with future facts. Moreover, these methods fail to capture historical evolutionary patterns, which grow increasingly complex as historical data accumulates. To this end, we propose TCPN, a novel Temporal Pyramidal Recurrent Network with contrastive learning for TKG extrapolation reasoning. Specifically, TCPN leverages a temporal pyramidal recurrent network to capture historical dependencies across multiple temporal scales, thereby refining temporal feature representations over extended time spans. Furthermore, TCPN seamlessly integrates contrastive learning to effectively align historical information with query semantics relevant to future facts. Lastly, we incorporate an adaptive time-aware mechanism, which uniformly models long-short term dependencies in time series with different granularities, explicitly fusing temporal feature information. Extensive experiments on four widely used TKG datasets show that TCPN significantly outperforms state-of-the-art methods across all metrics.
Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time
Enabled by large-scale text corpora with huge parameters, pre-trained language models operate as multi-task experts using a single model architecture. However, recent studies have revealed that certain neurons play disproportionately important roles in solving specific tasks, suggesting that task-relevant substructures can be isolated and selectively activated for each task. Therefore, we introduce Decomposition of Experts (DoE), a novel framework that dynamically identifies and activates task-specific experts within a language model to reduce inference cost without sacrificing accuracy. We first define a task expert as a set of parameters that significantly influence the performance of a specific task and propose a four-step unplug-and-play process: (1) receiving a user request, (2) identifying the corresponding task expert, (3) performing inference using the expert-localized model, and (4) restoring the original model and waiting for the next task. Using attribution methods and prompt tuning, DoE isolates task-relevant neurons, minimizing computational overhead while maintaining task performance. We assume a setting where a language model receives user requests from five widely used natural language understanding benchmarks, processing one task at a time. In this setup, we demonstrate that DoE achieves up to a x1.73 inference speed-up with a 65% pruning rate, without compromising accuracy. Comparisons with various task expert localization methods reveal that DoE effectively identifies task experts, while ablation studies validate the importance of its components. Additionally, we analyze the effects of batch size, token count, and layer types on inference speed-up, providing practical insights for adopting DoE. The proposed framework is both practical and scalable, applicable to any transformer-based architecture, offering a robust solution for efficient task-specific inference.
StreamingRT: Stream KNN Join with Ray Tracing Core
Efficient processing of k-nearest neighbor (kNN) join operations on streaming data is critical for applications in location-aware services, recommendation systems, and spatial analytics. To serve users in real time, these applications generally require a high-performance kNN join on continuously changing streaming data. This paper introduces StreamingRT, a framework that leverages ray tracing (RT) cores in GPUs to accelerate stream kNN joins in 3D space. By modeling stream data into large primitives and transferring queries into short rays, StreamingRT transforms the kNN join problem into an efficient ray tracing task. To address the ray tracing index updating overhead on stream data, we propose two key techniques, i.e., boundary-extended point partitioning and query-driven BVH lazy updating. Moreover, we also adopt multi-BVH coprocessing and CPU-GPU pipelining to improve performance. These techniques enable efficient stream kNN join on ray tracing cores, delivering unprecedented performance improvement. Experimental evaluations show that StreamingRT can achieve up to 2.2× and 5.8× speedup over the state-of-the-art approach on RT cores and CUDA cores, respectively.
STEP: Stepwise Curriculum Learning for Context-Knowledge Fusion in Conversational Recommendation
Conversational recommender systems (CRSs) aim to proactively capture user preferences through natural language dialogue and recommend high-quality items. To achieve this, CRS gathers user preferences via a dialog module and builds user profiles through a recommendation module to generate appropriate recommendations. However, existing CRS faces challenges in capturing the deep semantics of user preferences and dialogue context. In particular, the efficient integration of external knowledge graph (KG) information into dialogue generation and recommendation remains a pressing issue. Traditional approaches typically combine KG information directly with dialogue content, which often struggles with complex semantic relationships, resulting in recommendations that may not align with user expectations. To address these challenges, we introduce STEP, a conversational recommender centered on pre-trained language models that combines curriculum-guided context-knowledge fusion with lightweight task-specific prompt tuning. At its heart, an F-Former progressively aligns the dialogue context with knowledge-graph entities through a three-stage curriculum, thus resolving fine-grained semantic mismatches. The fused representation is then injected into the frozen language model via two minimal yet adaptive prefix prompts: a conversation prefix that steers response generation toward user intent and a recommendation prefix that biases item ranking toward knowledge-consistent candidates. This dual-prompt scheme allows the model to share cross-task semantics while respecting the distinct objectives of dialogue and recommendation. Experimental results show that STEP outperforms mainstream methods in the precision of recommendation and dialogue quality in two public datasets. Our code is available: https://github.com/Alex-bupt/STEP.
GFlowNet with Gradient-based Optimization for Bayesian Network Structure Learning
Bayesian network (BN) structure learning on the discrete observations is crucial for representing uncertainty in data. However, existing single-structure learning methods are commonly trapped in local optimum or yielding structures that may lead to poorly calibrated predictions. Although posterior approximation methods can quantify the epistemic uncertainty over the learned BN structures, they cannot reliably identify the optimal BN structures. To tackle these issues, we propose a GFlowNet with Gradient-based Optimization (GFlowOpt) for the BN structure learning method. Initially, we employ Bayesian Information Criterion (BIC) scores as the rewards of BN structures, and adopt a contrastive learning-based training technique on GFlowNet to efficiently generate much better DAG representations. Subsequently, we train a proxy model on the continuous DAG representations by adopting the gradient-ascent-based optimization method to search more high proxy-scoring discrete candidate DAGs. Finally, we adopt Hill Climbing (HC) on these candidate DAGs to search high-scoring DAGs as the ultimate BN structures. Extensive experiments conducted on benchmark datasets demonstrate that the superiority of our proposed method compared with other state-of-the-art methods.
PAnDA: Combating Negative Augmentation via Large Language Models for User Cold-Start Recommendations
The cold-start problem remains a long-standing challenge in recommender systems. Recent advances in large language models (LLMs) have opened new avenues for addressing cold-start scenarios through data augmentation. However, existing cold-start augmentation methods often suffer from negative augmentation, manifesting as incomplete augmentation, where generated interactions fail to comprehensively reflect user preferences, and inaccurate augmentation, where they conflict with user intent. These issues largely stem from two limitations: (1) the inability to effectively incorporate collaborative signals, which are critical for preference alignment, and (2) the lack of awareness of the downstream model's learning dynamics during data augmentation. To the best of our knowledge, the latter has not been studied in the literature. Consequently, we propose a novel framework named PAnDA. To address the incomplete augmentation issue, we propose a model-agnostic preference-aligned augmentation module to iteratively extract and fuse textual information and collaborative information by user-user preference matching and user-item preference coherence, which together form a contextual cue to guide the augmentor to generate high-quality augmented data. To overcome the inaccurate augmentation issue, we propose a model-specific downstream-model-aware adaptation module to adaptively align the augmented data with the model's states during the training process, guided by gradient similarity. Extensive experiments on three public benchmark datasets demonstrate that PAnDA outperforms different groups of state-of-the-art cold-start recommendation methods in all scenarios. The source code is publicly available at https://github.com/YantongDU/PAnDA.
An Embarrassingly Simple but Effective Knowledge-enhanced Recommender
Knowledge graphs (KG) have demonstrated significant potential in recommender systems by providing complementary semantic information that is typically absent in user-item interaction graphs (IG). While contrastive learning has emerged as a powerful paradigm for integrating these dual information sources, we identify a critical limitation in existing approaches: current methods fail to effectively balance the contrastive views derived from IG and KG, often resulting in performance degradation compared to using IG alone. To address this fundamental challenge, we propose SimKGCL, a novel contrastive learning framework that introduces a simple yet principled solution -- cross-view, layer-wise fusion between IG and KG representations prior to contrastive learning. This design ensures effective knowledge transfer while maintaining the discriminative power of contrastive objectives. Comprehensive experiments across three real-world benchmarks demonstrate that our approach not only consistently outperforms existing methods but also achieves remarkable efficiency gains. Our code is available through this link: https://figshare.com/articles/conference_contribution/SimKGCL/22783382.
SG-Filter: Enhancing Similar Text Retrieval via Hierarchical Summarized-Semantic Index and Adaptive Filtering
Similar Text Retrieval (STR) is an essential scenario in the field of information retrieval (IR). Unfortunately, existing mainstream vector-based retrieval methods cannot meet the recall rate requirements in STR scenarios (with a recall rate of less than 72%). This is because existing works have solely focused on the local information of text segments, that is, the text segments themselves ( i.e., semantic information ) and the relationships between them ( i.e., structured information ). Our key insight is that utilizing the global information of text segments ( i.e., summarized information ~. It includes the key expression of the documents to which the text segments belong and the relationship between documents. ) is crucial for improving the recall rate in STR, because the distinction of summarized information helps to filter out confusing vectors during retrieval. However, existing methods using summarized info still have a critical challenge. Their vectorization-based approaches fail to effectively model the global relationship in the summarized information, resulting in a further 79% deterioration in recall rate. To address this challenges, we present the SG-Filter, a novel retrieval framework that integrates summarized information by designing the hierarchical summarized-semantic index and the adaptive filtering strategy applied on it. (1) We propose a hierarchical summarized-semantic index by designing a summarized graph to model the summarized information. Specifically, we exploit the global information at both document and text segment levels through co-occurrence relationships and semantic associations. (2) We propose an adaptive filtering strategy that automatically determines which summarized words to filter per retrieval for effectively utilization of summarized information. (3) To ensure robustness and low retrieval latency, we propose a multi-path merge recall strategy to obtain summarized and semantic information at varying proportions, and develop an efficient vector retrieval method with filtering conditions. Experiments show that SG-Filter significantly increases recall rate by 10.53% ~ 22.92% on average compared with existing vector-based retrieval methods in STR. SG-Filter also ensures retrieval latency remains within tens of milliseconds. The code is open-sourced in https://github.com/strong-leaf/SG-Filter
Towards Instance-wise Personalized Federated Learning via Semi-Implicit Bayesian Prompt Tuning
Federated learning (FL) is a privacy-preserving machine learning paradigm that enables collaborative model training across multiple distributed clients without disclosing their raw data. Personalized federated learning (pFL) has gained increasing attention for its ability to address data heterogeneity. However, most existing pFL methods assume that each client's data follows a single distribution and learn one client-level personalized model for each client. This assumption often fails in practice, where a single client may possess data from multiple sources or domains, resulting in significant intra-client heterogeneity and suboptimal performance. To tackle this challenge, we propose pFedBayesPT, a fine-grained instance-wise pFL framework based on visual prompt tuning. Specifically, we formulate instance-wise prompt generation from a Bayesian perspective and model the prompt posterior as an implicit distribution to capture diverse visual semantics. We derive a variational training objective under the semi-implicit variational inference framework. Extensive experiments on benchmark datasets demonstrate that pFedBayesPT consistently outperforms existing pFL methods under both feature and label heterogeneity settings.
Exploring Iterative Refinement for Nested Named Entity Recognition with IoU-aware Denoising Diffusion
Named entity recognition (NER) is a key task in natural language processing, but existing methods often fail to effectively handle nested structures due to fuzzy entity boundaries and structural ambiguity. To address this challenge, we propose a novel nested NER method based on an IoU-aware denoising diffusion model, which formulates the nested NER task as a generative denoising process that progressively recovers gold entity spans from noisy span proposals. We generate noisy samples during training by gradually adding Gaussian noise to the ground-truth entity boundaries. We then train a denoiser incorporating a top-k selective attention mechanism to refine entity span proposals iteratively. To strengthen the alignment between boundary localization and entity classification, we introduce an IoU-aware loss function that optimizes the overlap between predicted and ground-truth spans. This design more accurately guides boundary regression and effectively reduces misalignment caused by conventional regression losses. Our model leverages sentence features and timesteps as conditional inputs to capture contextual information throughout the denoising process. During inference, the model generates final entity predictions by starting from random noise spans and iteratively refining them through a multi-step reverse diffusion process. We conduct extensive experiments on four nested NER datasets, ACE2004, ACE2005, GENIA, and KBP2017, as well as two flat NER datasets, CoNLL2003 and OntoNotes. Experimental results show that the proposed method consistently outperforms existing advanced models across all benchmarks, demonstrating its effectiveness.
Multi-Armed Bandits with Biased and Heteroscedastic Auxiliary Rewards
We study the multi-armed bandits with auxiliary rewards problem, in which pulling an arm yields not only a primary reward but also a set of auxiliary rewards, which represents some low-quality data. The auxiliary reward distribution can be biased and have higher variances than the primary reward distribution. We analyze the regret lower bound with general-order cumulative volume function, and deduce the conditions under which an algorithm can outperform the classical optimal regret bound without auxiliary rewards, attained by the state-of-the-art (SOTA) algorithm Asymptotically-Optimal-UCB (AO-UCB). Then we propose the BVA-MIN-UCB algorithm, which carefully incorporates the primary and auxiliary rewards by adjusting the potential biases and different variances. We show that BVA-MIN-UCB always performs no worse than AO-UCB asymptotically, and nearly matches the regret lower bound, even up to a constant factor. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithm.
DPT: Dynamic Preference Transfer for Cross-Domain Sequential Recommendation
Cross-domain sequential recommendation aims to generate accurate recommendations by leveraging users' historical interactions across domains. However, existing methods have two limitations: 1) When transferring user's preferences from the source domain, they encode preferences into a static and holistic representation, ignoring the rich information inherent in the dynamic evolution of user preferences over time; 2) They adopt a distribution-agnostic full-transfer strategy, failing to effectively limit the transfer degree of source-domain preferences according to different data distributions, which poses a risk of negative transfer. To address these issues, we propose the Dynamic Preference Transfer (DPT) model. Unlike existing methods, DPT places greater emphasis on the dynamic transfer of real-time preferences. First, DPT captures the causal features through the causal self-attention mechanism, and then realizes dynamic preference transfer at each time step via the causal cross-attention mechanism, thereby tracking the temporal dynamics of preferences from source domains. Second, to mitigate the negative transfer issue, a temperature-controlled mechanism is designed to adaptively balance source and target domain preferences, leveraging a temperature-controlled sigmoid function to effectively suppress interference from irrelevant preferences. Experimental results on multiple benchmark datasets show that the proposed method achieves significant performance improvements compared with the state-of-the-art (SOTA) methods, verifying its effectiveness and superiority. The codes are available in https://github.com/iryand/DPT.
FAIR-SE: Framework for Analyzing Information Disparities in Search Engines with Diverse LLM-Generated Personas
Search engine personalization, while enhancing user satisfaction, can lead to information disparities. Previous studies on this topic face limitations, such as the absence of context-aware data collection, superficial URL-level analysis, and human-dependent annotations. We propose FAIR-SE, a Framework for Analyzing Information dispaRities in Search Engines that addresses these challenges through AWS Lambda-based concurrent data collection and LLM-generated persona-based content analysis. We collected search results across four user contexts (Search History, Geo-location, Language Preference, and Access Environment) and analyzed them through four analytical perspectives (Political Leaning, Topic-specific Stance, Subjectivity, and Bias). Experiments conducted on two globally prominent search engines across nine controversial topics demonstrate the efficacy of FAIR-SE regarding benchmark accuracy, persona consistency, and ability to reflect real-world discourse patterns across diverse topics. Our statistical analysis identifies distinct search engine characteristics and demonstrates significant information disparities in our case studies examining regional disparities in search results. Our code and datasets are publicly available at: https://github.com/bigbases/FAIR-SE.
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of external knowledge source. However, in real-world applications, most situations involve diverse knowledge from various sources, yet this area has been less explored. The main dilemma is the lack of a suitable dataset containing multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Based on this dataset, we further develop a plug-and-play RAG framework, PruningRAG, whose main characteristic is the use of multi-granularity pruning strategies to optimize the integration of relevant information while minimizing misleading context. It consistently improves performance across various existing RAG variants, demonstrating its robustness and broad applicability. Building upon the standardized dataset and PruningRAG, we also report a series of experimental results, as well as insightful findings. Our dataset and code are publicly available. https://github.com/USTCAGI/PruningRAG, with the aim of advancing future research in the RAG community.
Decoder-only Pre-training Enhancement for Spatio-temporal Traffic Forecasting
Although spatio-temporal graph neural networks (STGNNs) become widely used methods in traffic forecasting, they still encounter an issue named short-sightedness. Specifically, due to high model complexity and GPU memory usage, STGNNs are restricted to processing only very short input time series. This limited context often causes STGNNs to focus on local variations and overlook long-term patterns, leading to misinterpretation of time series trends. To tackle this issue, recent studies propose to perform mask reconstruction pre-training on traffic series to enhance STGNNs. However, we argue that mask reconstruction is a suboptimal pre-training paradigm for traffic forecasting, because there exists a great gap between pre-training and downstream forecasting, caused by their inconsistent training targets. To eliminate this gap, we propose a new pre-training paradigm named next patch prediction and prove its advantages from both empirical and theoretical perspectives. Based on this paradigm, we introduce a new framework called Decoder-only Pre-training Enhancement (DoP) to unleash the potential of traffic pre-training model. Specifically, DoP uses Transformer decoders as infrastructure, and leverages next patch prediction as target to conduct pre-training. In addition, we propose a new dual-view temporal embedding to fully capture temporal information and spatial spectral enhancement to model spatial information. After pre-training, DoP enhances existing STGNNs seamlessly with periodic enhancement mechanism. On four real-world traffic benchmarks, we demonstrate its start-of-the-art performance.
StepTool: Enhancing Multi-Step Tool Usage in LLMs via Step-Grained Reinforcement Learning
Despite their powerful text generation capabilities, large language models (LLMs) still struggle to effectively utilize external tools to solve complex tasks, a challenge known as tool learning. Existing methods primarily rely on supervised fine-tuning, treating tool learning as a text generation problem while overlooking the decision-making complexities inherent in multi-step contexts. In this work, we propose modeling tool learning as a dynamic decision-making process and introduce StepTool, a novel step-grained reinforcement learning framework that enhances LLMs' capabilities in multi-step tool use. StepTool comprises two key components: Step-grained Reward Shaping, which assigns rewards to each tool interaction based on its invocation success and contribution to task completion; and Step-grained Optimization, which applies policy gradient methods to optimize the model across multiple decision steps. Extensive experiments across diverse benchmarks show that StepTool consistently outperforms both SFT-based and RL-based baselines in terms of task Pass Rate and Recall of relevant tools. Furthermore, our analysis suggests that StepTool helps models discover new tool-use strategies rather than merely re-weighting prior knowledge. These results highlight the importance of fine-grained decision modeling in tool learning and establish StepTool as a general and robust solution for enhancing multi-step tool use in LLMs. Code and data are available at https://github.com/yuyq18/StepTool.
Temporal Blocks with Memory Replay for Dynamic Graph Representation Learning
Dynamic graph representation learning (DGRL) aims to model the temporal evolution of graph structure and attributes, thereby generating low-dimensional node representations at different time steps. Most prevailing snapshot-based methods construct snapshots independently in time, assigning each interaction to a single snapshot. However, such a design limits the ability to capture long-range temporal patterns, leading to the forgetting of prior interactions and reducing the capacity of the model to recognize causal dependencies across events. To address this issue, we construct temporal blocks with the memory replay mechanism by sequentially merging several adjacent snapshots to capture long-range temporal patterns and causal dependencies over time. Building on this, we propose a novel dynamic graph representation learning model named TBD. Specifically, the model first encodes each temporal block using a graph neural network (GNN), and then captures cross-block dynamics through a Multi-Feature Gated Recurrent Unit (MF-GRU) that incorporates structural embeddings and a feature-aware gating mechanism to adapt to evolving graph structures. Furthermore, we introduce a Structure-Aware Node Smoothness Constraint (SA-NSC) to enforce temporal consistency while retaining adaptability to structural changes. Extensive experiments on multiple real-world datasets demonstrate that TBD consistently achieves superior performance, validating its effectiveness and robustness.
PKGRec: Personal Knowledge Graph Construction and Mining for Federated Recommendation Enhancement
Personal Knowledge Graphs (PKGs) organize an individual user's information into a structured format comprising entities, attributes, and relationships. By leveraging this structured and semantically rich data, PKGs have become essential for securing personal data management and delivering personalized services. To unlock their potential in personalized recommendations, prior research has explored the construction of PKGs and recommendation methods built upon them. However, these studies often overlook challenges associated with distributed PKGs across different users, such as joint training and privacy protection. To address these challenges, we propose PKGRec, a federated graph recommendation method specifically designed for PKGs, which utilizes a federated learning framework to ensure user privacy and data security during joint learning. Furthermore, to accommodate the user-centric graph structure of PKGs, our approach categorizes entities into three types: users, items, and other entities. It then applies a novel staged graph convolution method to model various entities based on these entity categories during local training. To enable efficient graph information sharing among distributed PKGs without requiring additional data transfer or aggregation, PKGRec performs graph expansion on the trained gradients by federated aggregation. Extensive experiments conducted on four publicly available datasets demonstrate that our method consistently outperforms the existing federated recommendation approaches.
SEF-UQR: Scalable and Efficient Privacy-Preserving Federated Updating QR Factorization
Applications in real-time machine learning and data analysis often require incremental updates to matrix decompositions as new data arrive. This capability is particularly crucial for streaming PCA, online learning, and iterative optimization algorithms, where data are continuously generated from distributed sources. However, privacy constraints prevent direct data sharing among participants, making collaborative QR decomposition updates challenging. To address this, we present SEF-UQR, a scalable and efficient framework for federated QR updates that focuses on incremental row-addition updates-common in streaming-data scenarios-while leveraging homomorphic encryption and interactive ciphertext protocols to protect both inputs and intermediate computations. SEF-UQR achieves accuracy on par with insecure recomputation, maintaining a mean squared error (MSE) below 1e-12. Empirical results demonstrate that SEF-UQR delivers at least a 10× runtime improvement over existing state-of-the-art methods employing fully homomorphic encryption, confirming its effectiveness for privacy-sensitive, real-time federated data analysis.
Aggregated Gradients-based Adaptive Learning Rate Design in Federated Learning
Federated Learning (FL) has emerged as a crucial distributed training paradigm, enabling discrete devices to collaboratively train a shared model while leveraging their locally stored private data. However, the non-independent-and-identically-distributed (Non-IID) data on heterogeneous clients may significantly impede training efficacy. In our study, we present a novel algorithm designed to alleviate client drifting on Non-IID data and enhance model performance, termed FedAgile (Aggregated Gradients-based AdaptIve LEarning Rate Design in FEDerated Learning), which designs the adaptive learning rate by introducing an aggregated gradient term to accelerate model convergence and mean-field terms to approximate the average local information over time. We refine the learning rate based on Jensen-Shannon (JS) Distance to enhance the generalization capability. Through rigorous theoretical analysis, we establish the existence and convergence analysis of the mean-field terms, which can be efficiently calculated via our proposed iterative algorithm with linear computational complexity. Further, we provide a robust upper bound on the convergence of FedAgile and prove that our algorithm achieves the linear convergence rate of Õ(T-1 ). The extensive experimental results on real-world datasets substantiate the superiority of our proposed FedAgile in comparison with existing state-of-the-art FL strategies, which can be easily incorporated with existing methods to further enhance model performance.
TriSeRec: A Tri-view Representation Learning Framework for Sequential/Session-based Recommendation
Sequential/session-based recommendation models aim to learn evolving user preferences from historical user behaviors. State-of-the-art sequential/session-based recommendation models often use graph neural networks or self-attention as their building blocks. Graph neural networks excel at learning local patterns encoded in graph-structured data and have therefore shown great performance on session-based recommendation datasets, where user interactions are usually relatively short. Self-attentive models, on the other hand, are much more powerful in capturing long-range dependencies and are able to outperform graph neural network-based approaches on sequential recommendation, where longer user interactions are more frequent. As such, the recommender systems community has noted a lack of a unified framework that can simultaneously achieve great performance on both sequential and session-based recommendation. In an effort to fill this gap, this paper presents TriSeRec, a Tri-view representation learning framework for Sequential/session-based Recommendation. By converting interaction sequences into two graphical views and one sequential view, three view-specific user representations are learned by TriSeRec using graph neural networks and self-attention. The tri-view representation learning module, which is built upon the recently proposed generalized Cauchy-Schwarz divergence, disentangles and then fuses consistent and complementary information in all three views to form the final user representations for next-item predictions. Experiments on popular large-scale, real-world benchmark datasets show that TriSeRec achieves state-of-the-art performance on both sequential recommendation and session-based recommendation.
A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search Queries
Large Language Models (LLMs) are increasingly used to generate search queries for various Information Retrieval (IR) tasks. However, it remains unclear how these machine-generated queries compare to human-written ones, particularly in terms of diversity and alignment with real user behavior. This paper presents an empirical comparison of LLM- and human-generated queries across multiple dimensions, including lexical diversity, linguistic variation, and retrieval effectiveness. We analyze queries produced by several LLMs and compare them with human queries from two datasets collected five years apart. Our findings show that while LLMs can generate diverse queries, their patterns differ from those observed in human behavior. LLM queries typically exhibit higher surface-level uniqueness but rely less on stopword use and word form variation. They also achieve lower retrieval effectiveness when judged against human queries, suggesting that LLM-generated queries may not always reflect real user intent. These differences highlight the limitations of current LLMs in replicating natural querying behavior. We discuss the implications of these findings for LLM-based query generation and user behavior simulation in IR. We conclude that while LLMs hold potential, they should be used with caution.
Querier-Aware LLM: Generating Personalized Responses to the Same Query from Different Queriers
Existing work on large language model (LLM) personalization assigned different responding roles to LLMs, but overlooked the diversity of queriers. In this work, we propose a new form of querier-aware LLM personalization, generating different responses even for the same query from different queriers. We design a dual-tower model architecture with a cross-querier general encoder and a querier-specific encoder. We further apply contrastive learning with multi-view augmentation, pulling close the dialogue representations of the same querier, while pulling apart those of different queriers. To mitigate the impact of query diversity on querier-contrastive learning, we cluster the dialogues based on query similarity and restrict the scope of contrastive learning within each cluster. To address the lack of datasets designed for querier-aware personalization, we also build a multi-querier dataset from English and Chinese scripts, as well as WeChat records, called MQDialog, containing 173 queriers and 12 responders. Extensive evaluations demonstrate that our design significantly improves the quality of personalized response generation, achieving relative improvement of 8.4% to 48.7% in ROUGE-L scores and winning rates ranging from 54% to 82% compared with various baseline methods.
CityLight: A Neighborhood-inclusive Universal Model for Coordinated City-scale Traffic Signal Control
City-scale traffic signal control (TSC) involves thousands of heterogeneous intersections with varying topologies, making cooperative decision-making across intersections particularly challenging. Given the prohibitive computational cost of learning individual policies for each intersection, some researchers explore learning a universal policy to control each intersection in a decentralized manner, where the key challenge is to construct a universal representation method for heterogeneous intersections. However, existing methods are limited to universally representing information of heterogeneous ego intersections, neglecting the essential representation of influence from their heterogeneous neighbors. Universally incorporating neighborhood information is nontrivial due to the intrinsic complexity of traffic flow interactions, as well as the challenge of modeling collective influences from neighbor intersections. To address these challenges, we propose CityLight, which learns a universal policy based on representations obtained with two major modules: a Neighbor Influence Encoder to explicitly model neighbor's influence with specified traffic flow relation and connectivity to the ego intersection; a Neighbor Influence Aggregator to attentively aggregate the influence of neighbors based on their mutual competitive relations. Extensive experiments on five city-scale datasets, ranging from 97 to 13,952 intersections, confirm the efficacy of CityLight, with an average throughput improvement of 11.68% and a lift of 22.59% for generalization. Our codes and datasets are released: https://github.com/tsinghua-fib-lab/CityLight.
SGPT: Few-Shot Prompt Tuning for Signed Graphs
Signed Graph Neural Networks (SGNNs) are effective in learning expressive representations for signed graphs but typically require substantial task-specific labels, limiting their applicability in label-scarce industrial scenarios. In contrast, unsigned graph structures are abundant and can be readily leveraged to pre-train Graph Neural Networks (GNNs), offering a promising solution to reduce supervision requirements in downstream signed graph tasks. However, transferring knowledge from unsigned to signed graphs is non-trivial due to the fundamental discrepancies in graph types and task objectives between pre-training and downstream phases. To address this challenge, we propose Signed Graph Prompt Tuning (SGPT), a novel graph prompting framework that adapts pre-trained unsigned GNNs to few-shot signed graph tasks. We first design a graph template based on balance theory to disentangle mixed node relationships introduced by negative links, mitigating the structural mismatches between unsigned and signed graphs. We further introduce a task template that reformulates downstream signed tasks into a unified link prediction objective, aligning their optimization goals with the pre-training task. Furthermore, we develop feature prompts that align downstream semantic spaces with the feature spaces learned during pre-training, and semantic prompts to integrate link sign semantics in a task-aware manner. We conduct extensive experiments on seven benchmark signed graph datasets, demonstrating that SGPT significantly outperforms existing state-of-the-art methods, establishing a powerful and generalizable solution for few-shot signed graph learning.
DSETA: Driving Style-Aware Estimated Time of Arrival
The accurate estimated time of arrival (ETA) is crucial for mobility and transportation applications. Although significant efforts have been made to improve ETA prediction, most existing approaches ignore the influence of individual driving habits and preferences, known as the driving style. Since different drivers may prefer specific routes and speeds based on their experience and familiarity with traffic conditions, driving styles play a crucial role in determining the actual ETA. To fill this gap, we present a novel approach, DSETA, which leverages deep learning to learn and then integrate driving style representations for personalized and precise ETA predictions. Our method employs a diffusion model that captures nuanced driving styles by generating driving speed distribution. We also utilize attention mechanisms to dynamically adjust the impacts of various spatio-temporal factors and driving styles on ETA predictions. Additionally, we introduce a Multi-View Multi-Task framework that incorporates auxiliary tasks, including segment-view driving style classification and route-view speed distribution prediction, to enhance the ETA learning process. A route-level speed prior regularization strategy further improves the model's generalization capabilities. Extensive experiments conducted on a large real-world trip trajectory dataset demonstrate that DSETA achieves high effectiveness and outperforms various baselines across multiple evaluation metrics.
A Hierarchical Structure-Enhanced Personalized Recommendation Model for Traditional Chinese Medicine Formulas Based on KG Diffusion Guidance
Artificial intelligence (AI) technology plays a crucial role in recommending prescriptions for traditional Chinese medicine (TCM). Previous studies have made significant progress by focusing on the symptom-herb relationship in prescriptions. However, several limitations hinder model performance: (i) Insufficient attention to patient-personalized information such as age, BMI, and medical history, which hampers accurate identification of syndrome and reduces efficacy. (ii) The typical long-tailed distribution of herb data introduces training biases and affects generalization ability. (iii) The oversight of the 'monarch, minister, assistant and envoy' compatibility among herbs increases the risk of toxicity or side effects, opposing the 'treatment based on syndrome differentiation' principle in clinical TCM. Therefore, we propose a novel hierarchical structure-enhanced personalized recommendation model for TCM formulas based on knowledge graph (KG) diffusion guidance, namely TCM-HEDPR. Specifically, we pre-train symptom representations using patient-personalized prompt sequences and apply prompt-oriented contrastive learning (CL) for data augmentation. Furthermore, we employ a KG-guided homogeneous graph diffusion method integrated with a self-attention mechanism to globally capture the non-linear symptom-herb relationship. Lastly, we design a heterogeneous graph hierarchical network to integrate herbal dispensing relationships with implicit syndromes, guiding the prescription generation process at a fine-grained level and mitigating the long-tailed herb data distribution problem. Extensive experiments on two public datasets and one clinical dataset demonstrate the effectiveness of TCM-HEDPR. In addition, we incorporate insights from modern medicine and network pharmacology to evaluate the recommended prescriptions comprehensively. It can provide a new paradigm for the recommendation of modern TCM.
HRCformer: Hierarchical Recursive Convolution-Transformer with Multi-Scale Adaptive Recalibration for Time Series Forecasting
Time series forecasting has significant applications across various domains, including industry, agriculture, and finance. Transformer-based models have shown significant promise in enhancing time series forecasting over the past few years. However, existing methods struggle to simultaneously capture local details and global semantics under single-view architectures. They also find it difficult to dynamically adapt to time-varying and multi-scale temporal patterns while accurately modeling the complex, time-varying relationships between multiple variables. To address these challenges, we propose HRCformer, a novel Transformer-based framework that introduces two key innovations: the Hierarchical Recursive Interaction Convolution (HRIC) and the Triad Adaptive Recalibration Module (TARM). HRIC achieves joint modeling of fine-grained short-term fluctuations and high-order cross-period dependencies in time series by integrating Divide-and-Process Convolution for local processing with Recursive Channel Interaction Convolution for global processing. TARM further enhances dynamic modeling via Dynamic Variance Attention, which amplifies critical temporal deviations through 3D attention, and the Adaptive Multivariate Recalibration, which uses a two-layer fully connected network with nonlinear activation to learn the dynamic relationships between channels, suppresses noise, and emphasizes informative multivariate interactions. Comprehensive experiments conducted on seven real-world datasets highlight the superiority of HRCformer compared to prior state-of-the-art methods.
EvalAgent: Towards Evaluating News Recommender Systems with LLM-based Agents
Online news platforms have become the primary source of information consumption, with recommender systems serving as critical gateways that shape public discourse through their algorithmic power, necessitating rigorous evaluation methodologies. Traditional offline evaluation methods struggle with evolving user behavior and dynamic system adaptation, while online experiments are costly, time-consuming, and ethically challenging. To address these challenges, this paper introduces EvalAgent, a large language model agent system for simulating real-world online news recommender systems. EvalAgent employs Stable Memory (StM) to model users' exploration-exploitation dynamics, mitigating noise from irrelevant interactions by analyzing the distribution density of news articles within the short-term memory, and incrementally maintains the long-term memory to capture users' high-level preferences, thereby enabling a consistent and reliable simulation of sustained interactions. It further incorporates an Environment Interaction Framework (EIF) to enable seamless engagement with real-world recommender systems. This approach yields a precise, scalable, and ethically responsible evaluation framework for news recommender systems. Comprehensive experiments and user studies substantiate EvalAgent's efficacy, with publicly available code to support ongoing research in recommender system evaluation.
SpeedSteiner: A Fast O(k1/2)-Approximation Algorithm for Directed Steiner Tree
The directed Steiner tree problem is fundamental in computer science with numerous applications. However, to date, there are no efficient algorithms with quality guarantees. In this paper, we take on this challenge and offer a fast algorithm with provable approximation guarantees. We introduce SpeedSteiner, a O(k1/2)-approximation algorithm, where k is the number of terminal nodes. In practice, SpeedSteiner can be several orders of magnitude faster than other methods with a similar approximation ratio. The speedup is achieved by combining several optimization techniques that exploit the inner structure of recursive-greedy algorithms. We systematically evaluate the proposed algorithm and verify its scalability and strong empirical performance.
Harnessing Commonsense: LLM-Driven Knowledge Integration for Fine-Grained Sentiment Analysis
Fine-grained sentiment analysis, which aims to identify sentiments associated with specific aspects within sentences, faces challenges in effectively incorporating commonsense knowledge. Recent advancements leveraging large language models (LLMs) as data generators show promise but are limited by the LLMs' lack of nuanced, domain-specific understanding and pose a significant risk of data leakage during inference, potentially leading to inflated performance metrics. To address these limitations, we propose LLM-Kit, a novel framework for commonsense-enhanced fine-grained sentiment analysis that integrates knowledge via LLM-guided graph construction, effectively mitigating data leakage risks. LLM-Kit operates in two key stages: (1) Commonsense Graph Construction (CGC): We design second-order rules and leverage LLMs for evaluation to ensure the accuracy of the generated graph and mitigate the risk of data leakage from LLMs. (2) Knowledge-integration Graph Representation Learning (KGRL): We extract knowledge that is aware of various aspects through Graph Representation Learning (GRL). To capture the underlying semantic nuances within the input sentence, we develop a Sentence Semantic Learning (SSL) module based on RoBERTa that explicitly encodes internal semantics. This module provides complementary information to the GCN, improving the model's ability to discern subtle sentiment variations related to different aspects. Comprehensive experiments on three public datasets affirm that LLM-Kit achieves comparable performance with state-of-the-art models.
SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models
Syllogistic reasoning is a fundamental aspect of legal decision-making, enabling logical conclusions by connecting general legal principles with specific case facts. Although existing large language models (LLMs) can generate responses to legal questions, they fail to perform explicit syllogistic reasoning, often producing implicit and unstructured answers that lack explainability and trustworthiness. To address this limitation, we propose SyLeR, a novel framework that empowers LLMs to engage in explicit syllogistic legal reasoning. SyLeR integrates a tree-structured hierarchical retrieval mechanism to effectively combine relevant legal statutes and precedent cases, forming comprehensive major premises. This is followed by a two-stage fine-tuning process: supervised fine-tuning warm-up establishes a foundational understanding of syllogistic reasoning, while reinforcement learning with a structure-aware reward mechanism refines the model's ability to generate diverse logically sound and well-structured reasoning paths. We conducted extensive experiments across various dimensions, including in-domain and cross-domain user groups (legal laypersons and practitioners), multiple languages (Chinese and French), and different LLM backbones (legal-specific and open-domain LLMs). The results show that SyLeR significantly improves response accuracy and consistently delivers explicit, explainable, and trustworthy legal reasoning.
Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion
Traditional recommendation methods model a user's interest in a target item by correlating its embedding with the embeddings of items from the user's interaction history, thereby capturing implicit collaborative filtering signals. Consequently, traditional ID-based methods often encounter data sparsity problems stemming from the sparse nature of ID features. To mitigate this issue, recommendation models incorporate multimodal item information to enhance recommendation accuracy. However, existing multimodal recommendation methods typically rely on early fusion approaches, which focus primarily on combining text and image features, while neglecting the dynamic context provided by user behavior sequences. This oversight precludes the dynamic adaptation of multimodal interest representations to behavioral patterns, thereby hindering the model's ability to effectively capture user multimodal interests. Therefore, this paper proposes the Distribution-Guided Multimodal-Interest Auto-Encoder (DMAE), which achieves the cross fusion of user multimodal interest at the behavioral level. Specifically, DMAE comprises three key components: 1) Multimodal Interest Encoding Unit (MIEU), which encodes the similarity scores between the target item and historically clicked items as the corresponding representation vectors of user interest across different modalities. 2) Multimodal Interest Fusion Unit (MIFU), which dynamically adapts these interest representations through both intra- and inter-modal fusion, a process contextualized by the user's behavioral sequence to achieve a fine-grained and behavior-aware representation of interest. 3) Interest-Distribution Decoding Unit (IDDU), which employs a decoder to reconstruct the encoded user interest representations into true similarity distributions for each modality. The similarity distributions serve as a guide for model learning, aiming to retain as much multimodal information as possible. Ultimately, extensive experiments demonstrate the superiority of DMAE.
HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning
Spatial-temporal graph representations play a crucial role in urban sensing applications, including traffic analysis, human mobility behavior modeling, and citywide crime prediction. However, a key challenge lies in the noisy and sparse nature of spatial-temporal data, which limits existing neural networks' ability to learn meaningful region representations in the spatial-temporal graph. To overcome these limitations, we propose HGAurban, a novel heterogeneous spatial-temporal graph masked autoencoder that leverages generative self-supervised learning for robust urban data representation. Our framework introduces a spatial-temporal heterogeneous graph encoder that extracts region-wise dependencies from multi-source data, enabling comprehensive modeling of diverse spatial relationships. Within our self-supervised learning paradigm, we implement a masked autoencoder that jointly processes node features and graph structure. This approach automatically learns heterogeneous spatial-temporal patterns across regions, significantly improving the representation of dynamic temporal correlations. Comprehensive experiments across multiple spatiotemporal mining tasks demonstrate that our framework outperforms state-of-the-art methods and robustly handles real-world urban data challenges, including noise and sparsity in both spatial and temporal dimensions.
GCoder: Improving Large Language Model for Generalized Graph Reasoning
Large Language Models (LLMs) have demonstrated remarkable progress across a variety of reasoning tasks. Among these, graph-related tasks-which require the integration of multiple reasoning paradigms and whose complexity increases with graph size-present unique challenges for LLMs. Existing research primarily centers on chain-of-thought (CoT) methods and code-based approaches. While CoT methods have shown promise in many domains, they often underperform in graph reasoning due to unverifiable reasoning steps, computational inaccuracies, and limited generalization capabilities. Code-based approaches, which leverage LLMs to generate executable programs, offer an alternative paradigm by offloading complex computations to external tools. However, current LLMs still face challenges related to closed-source restrictions, deployment difficulties, code quality, and generalization. To address these limitations, we propose GCoder, a code-based LLM specifically designed to enhance performance in generalized graph reasoning tasks. Our approach includes the construction of a comprehensive training dataset, GraphWild, which encompasses a wide range of graph formats and algorithms. We employ a multi-stage post-training process, incorporating Supervised Fine-Tuning (SFT) and Reinforcement Learning from Compiler Feedback (RLCF), to further refine the model's capabilities. For unseen tasks, a hybrid retrieval strategy is utilized to boost performance. Experimental results demonstrate that GCoder outperforms GPT-4o, achieving an average accuracy improvement of 13.29% across 13 different graph reasoning problems. Additionally, GCoder efficiently handles large-scale graphs with millions of nodes and diverse input formats. This advancement paves the way for more intuitive and effective graph reasoning using LLMs. Our code is available at https://github.com/Bklight999/GCoder.
IPNet: An Interaction Pattern-aware Neural Network for Temporal Link Prediction
Temporal link prediction, which aims to predict the future status of edges between target nodes, is vital for current prevalent online services. Most existing methods ignore node-level behavior patterns, which play a decisive role in temporal link prediction, as nodes that behave similarly are more likely to interact in the future. In this paper, we propose a novel continuous-time model, the Interaction Pattern-aware neural Network (IPNet), to capture node-level behavior patterns and network evolution by encoding interaction sequences and contextual windows. We further devise a random walk sampling strategy to enhance the extraction of these windows, preserving node-centric structural evolution. Experimental results on seven real-world networks demonstrate that IPNet outperforms state-of-the-art methods in both transductive and inductive link prediction tasks. The code can be accessed via https://github.com/CoderZQY/IPNet.
Advancing Graph Isomorphism Tests with Metric Space Indicators: A Tool for Improving Graph Learning Tasks
To enhance the capability of Graph Neural Networks (GNNs) in judging graph isomorphism and graph classification tasks, this paper introduces a metric space-based graph isomorphism judgment method called the k-MSI test, which offers more topological information than the k-WL test and demonstrates superior graph isomorphism judgment capabilities compared to the k-WL test at the same complexity level. On the open test isomorphic dataset BREC, our k-MSI test accuracy rate is more than 11% ahead of the other methods. Furthermore, based on the k-MSI test, we propose a feature enhancement method Node Metric Indicator (NMI) that supplies additional topological information of graphs for GNNs and presents a novel GNN named Metric Space Indicators Graph Neural Network (MSIGNN). Experimental results on a publicly available benchmark graph classification task indicate that the NMI feature-based MSIGNN outperforms state-of-the-art methods on the BREC graph isomorphism test dataset and achieves satisfactory performance on real-world datasets.
SarRec: Statistically-guaranteed Augmented Retrieval for Recommendation
Recently, Large Language Models with Retrieval-Augmented Generation (RAG) have recently emerged as a powerful paradigm for sequential recommendation. However, existing methods typically retrieve items for each user without any principled mechanism for guaranteeing the reliability of generated recommendations, limiting their trustworthiness. To address this, we introduce SarRec : Statistically-guaranteed Augmented Retrieval for Recommendations, a framework that uses a simple retrieval step to provide relevant context and delivers calibrated, uncertainty-aware predictions with formal statistical guarantees. Specifically, SarRec first constructs the user's context set, utilizing a lightweight differentiable retrieval mechanism for identifying relevant context, and then calibrates the LLM's outputs by adapting the conformal prediction mechanism. We further provide a theoretical analysis that establishes an upper bound on the expected risk of recommendation performance metrics. Extensive experiments on multiple datasets from different domains validate the effectiveness of our framework.
ECLIPSE: Efficient Cross-Lingual Log Intelligence Parser with Semantic Entropy-Enhanced LCS Algorithm
Log parsing is essential in software engineering but is challenged by the immense complexity of log templates and diverse cross-platform and cross-lingual log semantics and structures in industrial logs. We propose ECLIPSE, an Efficient Cross-platform and Cross-lingual Log Intelligent Parsing framework with Semantic Entropy-Enhanced Longest Common Subsequence algorithm in industrial Environments. ECLIPSE leverages large language models to extract log keywords and maintains a dynamic dictionary mapping these keywords to log templates. When parsing, it retrieves candidate templates based on the keywords and log length. We design an algorithm named Semantic Entropy-Enhanced Longest Common Subsequence (Entropy-ELCS) for identifying the best template, improving token-level accuracy by incorporating information entropy and semantic elements into the longest common subsequence algorithm. The dictionary is updated with new keywords and templates for continuous improvement. Experiments on public benchmarks and our industrial log parsing benchmark ECLIPSE-BENCH demonstrate that ECLIPSE achieves strong performance and superior efficiency, especially when handling large template sets.
TopKNet:Learning to Perceive the Top-K Pivotal Nodes in Spatio-Temporal Data for Traffic Forecasting
Traffic prediction is a crucial research area in spatio-temporal forecasting. The key to traffic forecasting lies in the effective modelling of complex dependencies in spatio-temporal graph as well as capturing spatio-temporal heterogeneity. The majority of existing methodologies harness fully connected graphs derived from spatio-temporal graph neural networks or transformer-based models to model intricate spatio-temporal dependencies. However, it is paramount to recognize that not every node within this spatio-temporal graph contributes equally to the modeling of such dependencies. Consequently, the ability to discern and effectively leverage the significant nodes within the graph holds the key to enhancing the accuracy of traffic prediction models. In this paper, we center our attention on the pivotal nodes within the spatio-temporal graph and propose a novel method called TopKNet for effective traffic forecasting. Specifically, we introduce Time-Aware TopK Attention and TopK GCN for pivotal nodes within the temporal and spatial dimensions respectively. Moreover, Time-Aware Spatial Identity Embedding and Heterogeneity-Aware Loss are designed to characterise the spatio-temporal heterogeneity of nodes. Experiments on six real-world traffic datasets verify our proposed method's effectiveness compared to state-of-the-art baselines. These results offer fresh perspectives and insights that can enrich the endeavors of subsequent researchers working on the design and optimization of traffic models. The code will be made public at the following website https://github.com/randomforest1111/TopKNet.
LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive Diagnosis
Cognitive diagnosis (CD) plays a crucial role in intelligent education, evaluating students' comprehension of knowledge concepts based on their test histories. However, current CD methods often model students, exercises, and knowledge concepts solely on their ID relationships, neglecting the abundant semantic relationships present within the educational data space. Furthermore, contemporary intelligent tutoring systems (ITS) frequently involve the addition of new students and exercises, creating cold-start scenarios that ID-based methods find challenging to manage effectively. The advent of large language models (LLMs) offers the potential for overcoming this challenge with open-world knowledge. In this paper, we propose LLM4CD, which Leverages Large Language Models for open-world knowledge Augmented Cognitive Diagnosis. Our method utilizes the open-world knowledge of LLMs to construct cognitively expressive textual representations, which are then encoded to introduce rich semantic information into the CD task. Additionally, we propose an innovative bi-level encoder framework that models students' test histories through two levels of encoders: a macro-level cognitive text encoder and a micro-level knowledge state encoder. This approach substitutes traditional ID embeddings with semantic representations, enabling the model to accommodate new students and exercises with open-world knowledge and address the cold-start problem. Extensive experimental results demonstrate that LLM4CD consistently outperforms previous CD models on multiple real-world datasets, validating the effectiveness of leveraging LLMs to introduce rich semantic information into the CD task.
FEDDGCN: A Frequency-Enhanced Decoupling Dynamic Graph Convolutional Network for Traffic Flow Prediction
As a core task in Intelligent Transportation Systems (ITS), traffic flow prediction is essential for resource allocation and real-time route planning. Effectively capturing complex temporal correlations and dynamic spatial dependencies in traffic flow data is critical yet challenging for accurate prediction. However, existing approaches are still limited by the insufficient capability for spatial-temporal pattern decoupling and the underutilization of frequency domain information. To address these issues, we propose a novel Frequency-Enhanced Dynamic Decoupling Graph Convolutional Network (FEDDGCN), which introduces a gated decoupling mechanism integrating temporal and spatial embeddings to decouple traffic flow into prominent periodic and perturbative component. It also achieves effective pattern separation by incorporating frequency domain analysis with Fourier filters. Furthermore, a dual-branch spatial-temporal learning module, employing a divide-and-conquer strategy, is designed to achieve separate modeling for the two distinct components. Specially, the dynamic graph convolution modules are utilized to learn spatial dependencies and temporal and frequency attention mechanisms further capture complex temporal correlations for prominent periodic and perturbative components.Extensive experiments on multiple real-world datasets demonstrate that FEDDGCN achieves superior predictive performance compared with state-of-the-art methods.
Tide: A Time-Wise Causal Debiasing Framework for Generative Dynamic Link Prediction
Dynamic link prediction aims to predict the future links in dynamic graphs. Existing generative dynamic link prediction studies utilize the global degree distribution for mitigating the over-estimation problem, which can model the time-invariant features while neglecting the time-varying features, resulting in capturing inaccurate evolution patterns. However, such time related features are intrinsically coupled, which makes simultaneously and independently modeling both features infeasible. Motivated by these issues, we propose a Time-wise causal debiasing framework (Tide) for generative dynamic link prediction, which does not resort to any extra trainable modules. Instead, to obtain the time-invariant features, we first utilize a time-invariant deconfounded learning mechanism for decoupling the prediction score with the degree distribution. To leverage the time-varying features, we intervene in the model during the inference stage by a predicted future degree distribution, aiming to make the accurate predictions for dynamic graphs. Experiments conducted on four public datasets under both inductive and transductive settings present that our Tide enhanced models can outperform their corresponding vanilla versions by up to 21.42% and 27.73% in terms of NDCG and Jaccard, respectively.
Hyperbolic Prompt Learning for Incremental Event Detection with LLMs
Class-incremental event detection (CIED) is essential for real-world information extraction systems, which must continually recognize new event types without forgetting past knowledge. The main challenge lies in balancing stability and adaptability under data imbalance. Existing methods often underuse the hierarchical and syntactic structures of language, and thus limit the generalization capacity. We propose HPLLM, a hyperbolic prompt-enhanced large language model framework, motivated by the observation that both embedding distributions and dependency graphs in event datasets exhibit hyperbolic properties. HPLLM integrates two key components: (1) Hyperbolic LoRA fine-tuning, enabling geometry-aware parameter adaptation for hierarchical semantics; and (2) Hyperbolic Adaptive Graph Diffusion Convolution (HADC), which encodes syntactic dependencies into structure-aware prompts for LLMs. Together, these techniques strengthen semantic discrimination, reduce forgetting, and improve adaptation across incremental stages. Extensive experiments on ACE2005 and MAVEN demonstrate that HPLLM consistently surpasses state-of-the-art baselines in macro-F1, achieving stronger retention of old knowledge and better generalization to new event types. In particular, the model shows clear gains on rare categories with few training mentions, demonstrating its robustness in imbalanced and few-shot regimes.
Traffic Safety Evaluation Based on Macroscopic Traffic Features in Road Tunnels
Traffic accidents are one of the leading causes of death in the world. As an important part of the design of traffic roads, tunnels bring convenience but also have huge safety risks. To monitor road safety in real time and give timely warnings for drivers in tunnels, where the light is dark, the space is limited, and the signal is unstable, we study the problem of traffic safety evaluation based on macroscopic traffic features in road tunnels. In particular, we transform the problem into a four-classification problem. To overcome the long collection cycle of traffic crash data, we use the time-to-collision index as the standard for dividing safety levels of road sections in tunnels. To achieve the goal of collecting data in real time under the environment constraints of tunnels, we use macroscopic traffic features as input in our model. Specifically, we design a deep learning model, where the lane block can extract the interaction information of sequential road segments in the same lane, and the prediction block can integrate the results of the individual prediction of each lane and the overall prediction. An extensive emprical study with real data offers insight into the effectiveness and efficiency of the proposed model.
TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios. Despite the potential for deeper insights, existing TAG representation learning primarily omit the semantic relationship among node texts, and mostly relies on supervised methods, necessitating extensive labeled data and limiting applicability across diverse contexts. This paper introduces a new self-supervised learning framework, Text-Attributed-Graph Multi-View Alignment (TAGA), which overcomes these constraints by integrating TAGs' structural and semantic dimensions. TAGA constructs two complementary views: Text-of-Graph view, which organizes node texts into structured documents based on graph topology, and the Graph-of-Text view, which converts textual nodes and connections into graph data. By aligning representations from both views, TAGA captures joint textual and structural information. In addition, a novel structure-preserving random walk algorithm is proposed for efficient training on large-sized TAGs. Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
Transferable Deep Clustering Model
Deep learning has shown remarkable success in the field of clustering recently. However, how to transfer a trained clustering model on a source domain to a target domain by leveraging the acquired knowledge to guide the clustering process remains challenging. Existing deep clustering methods often lack generalizability to new domains because they typically learn a group of fixed cluster centroids, which may not be optimal for the new domain distributions. In this paper, we propose a novel transferable deep clustering model that can automatically adapt the cluster centroids according to the distribution of data samples. Rather than learning a fixed set of centroids, our approach introduces a novel attention-based module that can adapt the centroids by measuring their relationship with samples. In addition, we theoretically show that our model is strictly more powerful than some classical clustering algorithms such as k-means or Gaussian Mixture Model (GMM). Experimental results on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of our proposed transfer learning framework, which significantly improves the performance on target domain and reduces the computational cost.
A Privacy-preserving Spatial Dataset Joinable Search in Cloud
In the era of big data, the demand for spatial dataset search has become increasingly urgent. Leveraging the powerful storage and computing capabilities of cloud platforms, the cloud has become a common choice for deploying dataset search services. However, under risks of untrusted cloud environment and malicious attacks, protecting the privacy of sensitive location information during spatial dataset search becomes particularly critical. This paper focuses on the problem of privacy-preserving spatial datasets joinable search in cloud, which has not been addressed in existing research. We first propose a grid-based joinable coverage distinction model to measure the joinability of spatial datasets, and further present a baseline scheme (PDJDS). To further enhance efficiency and reduce storage cost, we propose an optimized scheme (PDJDS+), which constructs a coarse-grained grid-based inverted index to filter candidate datasets and integrates a joinable coverage distinction check table to expedite the evaluation of spatial dataset coverage distinction. Experiments conducted on three real-world spatial data repositories demonstrate that our scheme achieves superior performance in terms of search accuracy, efficiency, and storage cost.
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
Large language models (LLMs) have achieved remarkable success in various tasks, yet they remain vulnerable to faithfulness hallucinations, where the output does not align with the input. In this study, we investigate whether social bias contributes to these hallucinations, a causal relationship that has not been explored. A key challenge is controlling confounders within the context, which complicates the isolation of causality between bias states and hallucinations. To address this, we utilize the Structural Causal Model (SCM) to establish and validate the causality and design bias interventions to control confounders. In addition, we develop the Bias Intervention Dataset (BID), which includes various social biases, enabling precise measurement of causal effects. Experiments on mainstream LLMs reveal that biases are significant causes of faithfulness hallucinations, and the effect of each bias state differs in direction. We further analyze the scope of these causal effects across various models, specifically focusing on unfairness hallucinations, which are primarily targeted by social bias, revealing the subtle yet significant causal effect of bias on hallucination generation.
Yes is Harder than No: A Behavioral Study of Framing Effects in Large Language Models Across Downstream Tasks
Framing effect is a well-known cognitive bias in which individuals' responses to the same underlying question vary depending on how the question is phrased. Recent studies suggest that large language models (LLMs) also exhibit framing effects, but existing work has primarily replicated psychological experiments using hand-crafted prompts, leaving their impact on practical downstream tasks underexplored. To fill in the gap, in this paper, we conduct a systematic empirical investigation into framing effects in LLMs across multiple real-world downstream tasks. We construct semantically equivalent prompts with positive and negative framings and evaluate a wide range of LLMs under these conditions. We uncover several behavioral regularities of framing effects in LLMs, among which the most notable one is a consistent response asymmetry: LLMs find answering ''yes'' harder than ''no''. That is, LLMs tend to issue affirmative responses (i.e., ''yes'') only when they are highly confident, while they incline to answer negatively (i.e., ''no'') under uncertainty. We interpret this asymmetry through the lens of Error Management Theory (EMT), which posits that rational agents adopt risk-averse strategies to minimize the more costly error. We empirically show that this behavior is partially attributable to a statistical imbalance in the frequency of positive versus negative framing cues in pretraining corpora. Furthermore, we demonstrate that the framing-induced bias in LLMs can inform prompt engineering and active in-context learning, i.e., using framing-sensitive samples as demonstrations can improve model performance. Finally, we offer a preliminary strategy to mitigate the framing effect, i.e., injecting debiasing instructions, which shows promise. In all, our work uncovers a fundamental behavioral bias in LLMs and offers practical guidance for their reliable deployment across downstream tasks.
Unbiased Reasoning for Knowledge-Intensive Tasks in Large Language Models via Conditional Front-Door Adjustment
Large Language Models (LLMs) have shown impressive capabilities in natural language processing but still struggle to perform well on knowledge-intensive tasks that require deep reasoning and the integration of external knowledge. Although methods such as Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) have been proposed to enhance LLMs with external knowledge, they still suffer from internal bias in LLMs, which often leads to incorrect answers. In this paper, we propose a novel causal prompting framework, Conditional Front-Door Prompting (CFD-Prompting), which enables the unbiased estimation of the causal effect between the query and the answer, conditional on external knowledge, while mitigating internal bias. By constructing counterfactual external knowledge, our framework simulates how the query behaves under varying contexts, addressing the challenge that the query is fixed and is not amenable to direct causal intervention. Compared to the standard front-door adjustment, the conditional variant operates under weaker assumptions, enhancing both robustness and generalisability of the reasoning process. Extensive experiments across multiple LLMs and benchmark datasets demonstrate that CFD-Prompting significantly outperforms existing baselines in both accuracy and robustness.
Robust Heterogeneous GNNs via Semantic Attention and Contrastive Learning
Heterogeneous Graph Neural Networks (HGNNs) have achieved significant success in various graph-related applications. However, their vulnerability to adversarial attacks remains insufficiently studied, particularly when malicious modifications are made to the graph structure, such as the addition of redundant edges or the removal of key semantic edges. These perturbations not only interfere with the aggregation of neighboring information but also compromise the semantic integrity of meta-paths, leading to a significant degradation in model performance. Although previous studies have preliminarily revealed the impact of structural perturbations on HGNN performance, there has been insufficient exploration into how to develop efficient defense mechanisms from both structural and semantic perspectives. To address this, we propose a novel defense framework that integrates a meta-path-guided semantic-aware attention mechanism. This mechanism dynamically adjusts edge weights to suppress noisy connections and enhance those that are structurally and semantically significant. Additionally, to compensate for the expressive power of raw features, we introduce a contrastive learning strategy that combines local and global structural augmentations to guide the model in learning perturbation-invariant representations in a self-supervised manner. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that our proposed method significantly improves the robustness of HGNNs against adversarial attacks, showing both effectiveness and generalization capability.
Hybrid2: Distributed GNN Training System Enhanced by Dual-Hybrid for Sampling and Loading
Graph Neural Networks (GNNs) are the rising standard for graph tasks, yet their distributed training in servers or computing power network remains challenging. Cross-machine sampling and data loading often create bottlenecks, leading to inefficient resource utilization. In this paper, we present Hybrid2, a distributed GNN training system that combines full-graph and mini-batch training through a novel hybrid-batch training method. It also adopts hybrid feature extraction, leveraging both local caching and remote access to improve feature retrieval efficiency. The integration of these methods in Hybrid² results in a dual hybrid-gain effect. First, it reduces sampling and loading overhead by pre-aggregating neighbors for each target vertex, minimizing the layers to sample and load. Second, it accelerates data loading by dynamically identifying and locally caching the most frequently accessed vertices during training, maximizing memory efficiency. Experimental results demonstrate that Hybrid² brings substantial performance improvements across key components of distributed GNN training. Network communication overhead is reduced by up to tens of times, while both sampling and loading achieve at least several-fold speedups. These gains contribute to an overall training acceleration exceeding 20× compared to DistDGL, all with comparable GPU memory usage and no loss in accuracy. Compared to the state-of-the-art system, it achieves nearly 3× speedup while using fewer resources.
CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes
Large Language Models (LLMs) achieve remarkable success, but their static nature leads to inherent limitations and persistent mistakes in dynamic real-world scenarios. While Continual Instruction Tuning (CIT) and Continual Pre-training (CPT) are primary continual learning approaches, they struggle with scalable knowledge acquisition and maintaining model capabilities. To address these, we propose the Continue Evolving from Mistakes (CEM) method, a novel and data-efficient framework for continuous LLM evolution. Inspired by human learning, CEM establishes an iterative process: it efficiently collects targeted CPT data by robustly identifying LLM mistakes and uncertainties (via an Ambiguity-Aware Knowledge Collection (AAKC) algorithm), and employs a novel joint training paradigm that leverages CIT and CPT to assimilate knowledge efficiently while maintaining existing capabilities and mitigating catastrophic forgetting. Extensive experiments confirm CEM's effectiveness, yielding substantial accuracy gains for multiple models, increasing accuracy by up to 29.63%. Code and datasets are available on GitHub https://anonymous.4open.science/r/cem-BB25.
Antelope: Potent and Concealed Jailbreak Attack Strategy
Due to the remarkable generative potential of diffusion-based models, numerous researches have investigated jailbreak attacks targeting these frameworks. A particularly concerning threat within image models is the generation of Not-Safe-for-Work (NSFW) content. Despite the implementation of security filters, numerous efforts continue to explore ways to circumvent these safeguards. Current attack methodologies primarily encompass adversarial prompt engineering or concept obfuscation, yet they frequently suffer from slow search efficiency, conspicuous attack characteristics and poor alignment with targets. To overcome these challenges, we propose Antelope, a more robust and covert jailbreak attack strategy designed to expose security vulnerabilities inherent in generative models. Specifically, Antelope leverages the confusion of sensitive concepts with similar ones, facilitates searches in the semantically adjacent text space of these related concepts and aligns them with the target imagery, thereby generating sensitive images that are consistent with the target and capable of evading detection. Besides, we successfully exploit the transferability of model-based attacks to penetrate online black-box services. Experimental evaluations demonstrate that Antelope outperforms existing baselines across multiple defensive mechanisms, underscoring its efficacy and versatility. Disclaimer: This paper contains unsafe imagery that might be offensive to some readers.
ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation
Recent advances in large language models (LLMs) have demonstrated impressive capabilities in code-related tasks such as code generation and automated program repair. Despite their promising performance, most existing approaches for code repair suffer from high training costs or computationally expensive inference. Retrieval-augmented generation (RAG), with its efficient in-context learning paradigm, offers a more scalable alternative. However, conventional retrieval strategies, which are often based on holistic code-text embeddings, fail to capture the structural intricacies of code, resulting in suboptimal retrieval quality. To address the above limitations, we propose ReCode, a fine-grained retrieval-augmented in-context learning framework designed for accurate and efficient code repair. Specifically, ReCode introduces two key innovations: (1) an algorithm-aware retrieval strategy that narrows the search space using preliminary algorithm type predictions; and (2) a modular dual-encoder architecture that separately processes code and textual inputs, enabling fine-grained semantic matching between input and retrieved contexts. Furthermore, we propose RACodeBench, a new benchmark constructed from real-world user-submitted buggy code, which addresses the limitations of synthetic benchmarks and supports realistic evaluation. Experimental results on RACodeBench and competitive programming datasets demonstrate that ReCode achieves higher repair accuracy with significantly reduced inference cost, highlighting its practical value for real-world code repair scenarios.
FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection
Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority within a graph, playing a crucial role in applications such as social networks and e-commerce. Despite the current advancements in deep learning-based GAD, existing approaches often suffer from high deployment costs and poor scalability due to their complex and resource-intensive training processes. Surprisingly, our empirical findings suggest that the training phase of deep GAD methods, commonly perceived as crucial, may actually contribute less to anomaly detection performance than expected. Inspired by this, we propose FreeGAD, a novel training-free yet effective GAD method. Specifically, it leverages an affinity-gated residual encoder to generate anomaly-aware representations. Meanwhile, FreeGAD identifies anchor nodes as pseudo-normal and anomalous guides, followed by calculating anomaly scores through anchor-guided statistical deviations. Extensive experiments demonstrate that FreeGAD achieves superior anomaly detection performance, efficiency, and scalability on multiple benchmark datasets from diverse domains, without any training or iterative optimization.
Adapting LLMs for Personalized Evaluation of Explanations for Recommendations: A Meta-Learning Approach based on MAML
Providing explanations to justify recommendations enhances user satisfaction and trust. Despite significant research on explanation generation methods, evaluating their quality remains a critical yet under-explored challenge. Although large language models (LLMs) have been used for automated evaluation of explanations, existing approaches fail to account for the highly personalized na- ture of explanation assessment, where user judgments towards the same explanations vary significantly. To address this, we pro- pose MAML+PEFT method that combines Model-Agnostic Meta- Learning (MAML) with LoRA-based parameter-efficient tuning to adapt LLMs for personalized explanation evaluation. Building on this, we introduce TSA-MAML (Task Similarity Aware MAML)+PEFT, which clusters users based on their estimated optimal model param- eters and learns group-specific meta models by leveraging implicit group distributions of user preferences. Experiments on synthetic and human-annotated datasets demonstrate superior alignment of MAML-based methods with human ratings in both generalization and few-shot adaptation settings. Additionally, we examine the cor- relation of MAML-based LLM-simulated human ratings with real online user behaviors on a large-scale recommendation platform, demonstrating the practical utility of our methods for real-world explainable recommendation systems.
ClariLM: Enhancing Open-domain Clarification Ability for Large Language Models
Active understanding and clarification of user intent is crucial for information-seeking systems based on Large Language Models (LLMs), as it enhances search efficiency and improves user experience for human-LLM interaction. While existing systems rely on domain-specific resources to generate clarifying questions, they face challenges when extended to open-domain scenarios due to the lack of human-LLM clarification data. In this paper, we propose ClariLM to synthesize large-scale clarification data and enhance the LLMs' clarification capability. Specifically, we design two key stages to prepare data: first, given a user question, the Clarification Facet Detection (CFD) stage employs a facet mining model learned from human-LLM conversation logs to predict realistic potential clarification candidates. Additionally, it incorporates direct predictions from powerful LLMs as supplements to guarantee comprehensive facet coverage. While CFD ensures high recall of facet candidates, the subsequent Optimal Facet Selection (OFS) stage synthesizes a set of new questions and employs a reasoning model to annotate the optimal facet for each question, which further improves the precision of ClariLM in clarification necessity prediction and optimal facet selection. The collected data are then applied for supervised fine-tuning, followed by constructing preference data for preference optimization. Experiments on our custom test set and two public benchmarks demonstrate that ClariLM significantly outperforms various baseline models across clarification necessity, clarifying question quality, and GPT-4-based comparative evaluation.
FollowGPT: A Framework of Follow-up Question Generation for Large Language Models via Conversation Log Mining
During interactions between users and Large Language Models (LLMs), users often engage in multi-turn questioning. Understanding the user's potential follow-up intents and generating follow-up question candidates for the user is crucial for enhancing their experience with LLMs. Existing methods for follow-up question generation mainly rely on hand-crafted rules, the internal knowledge of LLMs, or the integration of external knowledge. However, these approaches fail to effectively leverage real-world user follow-up intents when interacting with LLMs, resulting in generated questions that do not meet the needs of practical scenarios. In this paper, we propose FollowGPT, a model that mines user follow-up intents from user-LLM conversational logs. However, directly introducing raw conversation logs leads to significant noise and sparsity issues. Therefore, to address noises, FollowGPT adopts a hierarchical filtering strategy for data cleaning. To mitigate the sparsity issue, FollowGPT employs data synthesis methods to augment the log data across three dimensions: topic diversity, intent transition diversity, and negative sample diversity. The processed data is then consolidated into a new dataset named ShareFQG for both training and evaluation. Finally, we train FollowGPT using a two-stage training framework involving supervised fine-tuning and preference optimization. In our experiments, we evaluate on both the ShareFQG test set and a publicly available dataset, FollowupQG, using both automated metrics and GPT-4o-based comparative evaluation. The experimental results demonstrate that our method outperforms existing baselines in various metrics, including lexical similarity, semantic similarity, and GPT-4-based evaluation for follow-up question generation, demonstrating FollowGPT's effectiveness.
Autonomous Reasoning-Retrieval for Large Language Model Based Recommendation
Recently, large language models (LLMs) have been introduced into recommender systems (RSs) as recommendation backbones or to enhance traditional recommendation models (TRMs). However, existing LLM-based RSs fail to fully leverage the complementary strengths of LLMs (e.g., world knowledge and reasoning capabilities) and TRMs (e.g., recommendation-specific knowledge and computational efficiency), resulting in shallow exploration of the item space. To address this limitation, we propose DeepRec, a novel LLM-based RS approach that facilitates autonomous multi-turn interactions between LLMs and TRMs for deep item space exploration. In each interaction turn, LLMs reason over user preferences and collaborate with TRMs to retrieve candidate items. After multi-turn interaction, LLMs rank the aggregated candidates to generate the final recommendations. We utilize reinforcement learning (RL) for optimization and introduce novel contributions in three key aspects: recommendation model based data rollout, recommendation-oriented hierarchical rewards, and a two-stage RL training strategy. For data rollout, we design a preference-aware TRM, with which LLMs interact to construct trajectory data. For reward design, we propose a hierarchical reward function that comprises both process-level and outcome-level rewards to optimize the interaction process and recommendation quality, respectively. For RL training, our two-stage RL strategy first guides LLMs to learn effective interactions with TRMs, followed by recommendation-oriented RL for performance enhancement. Experiments on public datasets show that DeepRec substantially outperforms both traditional and existing LLM-based baselines, establishing a new paradigm for deep exploration in recommender systems.
Adaptive Spline Networks in the Kolmogorov-Arnold Framework: Knot Analysis and Stability Enhancement
Kolmogorov-Arnold Neural Networks (KANs) have recently attracted significant attention in the machine learning community. However, their practical implementation often faces challenges such as poor training stability and a large number of trainable parameters. Moreover, the behavior of learnable activation functions based on B-splines remains insufficiently understood. In this work, we analyze KANs through the lens of spline knot behavior and derive lower and upper bounds on the number of knots in B-spline-based KANs. To address the existing limitations, we propose a novel KAN-based approach, which improves upon the original KAN by reducing the number of trainable parameters to match the scale of standard Multi-Layer Perceptrons (MLPs), while enhancing overall performance. Additionally, we introduce a new training strategy that enforces C2 continuity in the learnable splines, leading to smoother activation functions and improved training stability via range expansion. We evaluate our method across eight diverse datasets encompassing image, text, time series, multimodal, and function approximation tasks. The promising results demonstrate the feasibility of KAN-based architectures and the effectiveness of our proposed enhancements. The proposed method implementation is released in https://github.com/IcurasLW/FR-KAN.git
Modeling Edge-Specific Node Features through Co-Representation Neural Hypergraph Diffusion
Hypergraphs are widely being employed to represent complex higher-order relations in real-world applications. Most existing research on hypergraph learning focuses on node-level or edge-level tasks. A practically relevant and more challenging task, edge-dependent node classification (ENC), is still under-explored. In ENC, a node can have different labels across different hyperedges, which requires the modeling of node features unique to each hyperedge. The state-of-the-art ENC solution, WHATsNet, only outputs single node and edge representations, leading to the limitations of entangled edge-specific features and non-adaptive representation sizes when applied to ENC. Additionally, WHATsNet suffers from the common oversmoothing issue in most HGNNs. To address these limitations, we propose CoNHD, a novel HGNN architecture specifically designed to model edge-specific features for ENC. Instead of learning separate representations for nodes and edges, CoNHD reformulates within-edge and within-node interactions as a hypergraph diffusion process over node-edge co-representations. We develop a neural implementation of the proposed diffusion process, leveraging equivariant networks as diffusion operators to effectively learn the diffusion dynamics from data. Extensive experiments demonstrate that CoNHD achieves the best performance across all benchmark ENC datasets and several downstream tasks without sacrificing efficiency. Our implementation is available at https://github.com/zhengyijia/CoNHD.
MI4Rec: Pretrained Language Model based Cold-Start Recommendation with Meta-Item Embeddings
Recently, pretrained large language models (LLMs) have been widely adopted in recommendation systems to leverage their textual understanding and reasoning abilities to model user behaviors and suggest future items. A key challenge in this setting is that items on most platforms are not included in the LLM's training data. Therefore, existing methods often fine-tune LLMs by introducing auxiliary item tokens to capture item semantics. However, in real-world applications such as e-commerce and short video platforms, the item space evolves rapidly, which gives rise to a cold-start setting, where many newly introduced items receive little or even no user engagement. This poses challenges in both learning accurate item token embeddings and generalizing efficiently to accommodate the continual influx of new items. In this work, we propose a novel meta-item token learning strategy to address both these challenges simultaneously. Specifically, we introduce MI4Rec, an LLM-based approach for recommendation that uses just a few learnable meta-item tokens and an LLM encoder to dynamically aggregate meta-items based on item content. We show that this paradigm allows highly efficient and accurate learning in such challenging settings. Extensive experiments on Yelp and Amazon reviews datasets demonstrate the effectiveness of MI4Rec in both warm-start and cold-start recommendations. Notably, MI4Rec achieves an average performance improvement of 20.4% in Recall and NDCG compared to the best-performing baselines. The implementation of MI4Rec is available at https://github.com/zhengzaiyi/MI4Rec
EvoFormer: Learning Dynamic Graph-Level Representations with Structural and Temporal Bias Correction
Dynamic graph-level embedding aims to capture structural evolution in networks, which is essential for modeling real-world scenarios. However, existing methods face two critical yet under-explored issues: Structural Visit Bias, where random walk sampling disproportionately emphasizes high-degree nodes, leading to redundant and noisy structural representations; and Abrupt Evolution Blindness, the failure to effectively detect sudden structural changes due to rigid or overly simplistic temporal modeling strategies, resulting in inconsistent temporal embeddings. To overcome these challenges, we propose EvoFormer, an evolution-aware Transformer framework tailored for dynamic graph-level representation learning. To mitigate Structural Visit Bias, EvoFormer introduces a Structure-Aware Transformer Module that incorporates positional encoding based on node structural roles, allowing the model to globally differentiate and accurately represent node structures. To overcome Abrupt Evolution Blindness, EvoFormer employs an Evolution-Sensitive Temporal Module, which explicitly models temporal evolution through a sequential three-step strategy: (I) Random Walk Timestamp Classification, generating initial timestamp-aware graph-level embeddings; (II) Graph-Level Temporal Segmentation, partitioning the graph stream into segments reflecting structurally coherent periods; and (III) Segment-Aware Temporal Self-Attention combined with an Edge Evolution Prediction task, enabling the model to precisely capture segment boundaries and perceive structural evolution trends, effectively adapting to rapid temporal shifts. Extensive evaluations on five benchmark datasets confirm that EvoFormer achieves state-of-the-art performance in graph similarity ranking, temporal anomaly detection, and temporal segmentation tasks, validating its effectiveness in correcting structural and temporal biases. Code is available at https://github.com/zlx0823/EvoFormerCode.
Budget and Frequency Controlled Cost-Aware Model Extraction Attack on Sequential Recommenders
Sequential recommenders are integral to many applications yet remain vulnerable to model extraction attacks, in which adversaries can recover information about the deployed model by issuing queries to a black-box without internal access. From the attacker's perspective, existing studies impose a fixed and limited query budget but overlook optimal allocation, resulting in redundant or low-value requests. Furthermore, the scarce data obtained through these costly queries is typically handled by crude random sampling, resulting in low diversity and information coverage with actual data. In this paper, we propose a novel approach, named Budget and Frequency Controlled Cost-Aware Model Extraction Attack (BECOME), for extracting black-box sequential recommenders, which extends the standard extraction framework with two cost-aware innovations: Feedback-Driven Dynamic Budgeting periodically evaluates the victim model to refine query allocation and steer sequence generation adaptively. Rank-Aware Frequency Controlling integrates frequency constraints with ranking guidance in the next-item sampler to select high-value items and broaden information coverage. Experiments on public datasets and representative sequential recommender architectures demonstrate that our method achieves superior extraction performance. Our code is released at https://github.com/Loche2/BECOME.
Enhancing Dual-Target Cross-Domain Recommendation via Similar User Bridging
Dual-target cross-domain recommendation aims to mitigate data sparsity and enables mutual enhancement via bidirectional knowledge transfer. Most existing methods rely on overlapping users to build cross-domain connections. However, in many real-world scenarios, overlapping data is extremely limited-or even entirely absent-significantly diminishing the effectiveness of these methods. To address this challenge, we propose SUBCDR, a novel framework that leverages large language models (LLMs) to bridge similar users across domains, thereby enhancing dual-target cross-domain recommendation. Specifically, we introduce a Multi-Interests-Aware Prompt Learning mechanism that enables LLMs to generate comprehensive user profiles, disentangling domain-invariant interest points while capturing fine-grained preferences. Then, we construct intra-domain bipartite graphs from user-item interactions and an inter-domain heterogeneous graph that links similar users across domains. Subsequently, to facilitate effective knowledge transfer, we employ Graph Convolutional Networks (GCNs) for intra-domain relationship modeling and design an Inter-domain Hierarchical Attention Network (InterHAN) to facilitate inter-domain knowledge transfer through similar users, learning both shared and specific user representations. Extensive experiments on seven public datasets demonstrate that SUBCDR outperforms state-of-the-art cross-domain recommendation algorithms and single-domain recommendation methods. Our code is publicly available at https://github.com/97z/SUBCDR.git.
BALM-TSF: Balanced Multimodal Alignment for LLM-Based Time Series Forecasting
Time series forecasting is a long-standing and highly challenging research topic. Recently, driven by the rise of large language models (LLMs), research has increasingly shifted from purely time series methods toward harnessing textual modalities to enhance forecasting performance. However, the vast discrepancy between text and temporal data often leads current multimodal architectures to over-emphasise one modality while neglecting the other, resulting in information loss that harms forecasting performance. To address this modality imbalance, we introduce BALM-TSF (Balanced Multimodal Alignment for LLM-Based Time Series Forecasting), a lightweight time series forecasting framework that maintains balance between the two modalities. Specifically, raw time series are processed by the time series encoder, while descriptive statistics of raw time series are fed to an LLM with learnable prompt, producing compact textual embeddings. To ensure balanced cross-modal context alignment of time series and textual embeddings, a simple yet effective scaling strategy combined with a contrastive objective then maps these textual embeddings into the latent space of the time series embeddings. Finally, the aligned textual semantic embeddings and time series embeddings are together integrated for forecasting. Extensive experiments on standard benchmarks show that, with minimal trainable parameters, BALM-TSF achieves state-of-the-art performance in both long-term and few-shot forecasting, confirming its ability to harness complementary information from text and time series. Code is available at https://github.com/ShiqiaoZhou/BALM-TSF.
Calibrated and Diverse News Coverage
In recent years, there has been a debate about whether automated news aggregators, like Google News, lead readers to content that reinforces their existing beliefs and restricts their exposure to a biased subset of perspectives. To avoid bias, it has become common practice that news aggregators provide articles based on source diversity: for each story, they pick articles from news sources with different political leanings. In this paper, we ask whether this practice is sufficient. In particular, we study how well the diversity of viewpoints, in particular with respect to entities, is covered by articles picked using plain source diversity. We analyze a dataset fetched from Google News and find that, even though the top articles exhibit some diversity with respect to the leanings of the news outlets, many possible viewpoints towards the entities are missing. Based on this observation we design novel methods for selecting a small set of articles that cover all possible viewpoints; to ensure that our selections are useful we show how to incorporate the user preferences into our model. Our experiments on four real-world datasets show that our algorithms cover significantly more different viewpoints than previous baselines.
DebiasedKGE: Towards Mitigating Spurious Forgetting in Continual Knowledge Graph Embedding
To maintain an effective memory of old knowledge in a dynamically growing knowledge environment, continual knowledge graph embedding (CKGE) focuses on alleviating catastrophic forgetting. However, existing CKGE methods still suffer substantial performance degradation in dynamic knowledge graphs (DKG). We have found this challenge is mainly posed by spurious forgetting, a previously overlooked phenomenon that arises from the inherent interference effects in the continual learning (CL) process. In this paper, we deeply explore spurious forgetting in CKGE. First, we reveal two primary causes of spurious forgetting, knowledge interference and knowledge misalignment, and how to affect knowledge biasing within dynamic learning scenarios. Second, to fill this research gap, we propose a robust and efficient CKGE method (DebiasedKGE) for mitigating spurious forgetting. Specifically, to alleviate knowledge interference, we propose a mutual information-guided disentangled learning mechanism, which identifies latent features of different knowledge types and learns independent semantic representations for each, thereby reducing interference in knowledge embedding. Furthermore, to mitigate the deviation of new knowledge from previously learned knowledge, we design a dual-view regularized knowledge alignment mechanism that jointly constrains both the magnitude and direction of embedding transitions. Finally, we evaluate DebiasedKGE on four public CKGE datasets and two additional datasets constructed to contain knowledge perturbations of different dimensions. The results show that DebiasedKGE effectively alleviates spurious forgetting and achieves significant performance improvements. Our codes and datasets are available at https://anonymous.4open.science/r/DebiasedKGE.
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. Our approach perturbs latent variables, interprets changes in generated data, and uses multimodal large language models (MLLMs) to produce human-understandable explanations. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations for latent variables. The results highlight the effectiveness of incorporating inductive biases and uncertainty quantification, significantly enhancing model interpretability.
FinCast: A Foundation Model for Financial Time-Series Forecasting
Financial time-series forecasting is critical for maintaining economic stability, guiding informed policymaking, and promoting sustainable investment practices. However, it remains challenging due to various underlying pattern shifts. These shifts arise primarily from three sources: temporal non-stationarity (distribution changes over time), multi-domain diversity (distinct patterns across financial domains such as stocks, commodities, and futures), and varying temporal resolutions (patterns differing across per-second, hourly, daily, or weekly indicators). While recent deep learning methods attempt to address these complexities, they frequently suffer from overfitting and typically require extensive domain-specific fine-tuning. To overcome these limitations, we introduce FinCast, the first foundation model specifically designed for financial time-series forecasting, trained on large-scale financial datasets. Remarkably, FinCast exhibits robust zero-shot performance, effectively capturing diverse patterns without domain-specific fine-tuning. Comprehensive empirical and qualitative evaluations demonstrate that FinCast surpasses existing state-of-the-art methods, highlighting its strong generalization capabilities.
FunLoc: A Novel Function-level Bug Localization Framework Enhanced by Contrastive and Active Learning Strategies
The increasing complexity of software systems has made them more prone to bugs, prompting the development of automated bug localization techniques to ensure software reliability. Despite these techniques having demonstrated notable success at the file level, their application and optimization at the function level often encounter serious performance cliffs. This limitation underscores the urgent need for a dedicated framework for function-level bug localization, which we address through FunLoc, a novel framework that takes coarse-grained source files as input units and identifies fine-grained buggy functions as output. To address the critical challenges of handling domain-specific bug reports and managing vast function-level sample space, we introduce two key innovations that are seamlessly integrated into FunLoc. First, we design a contrastive learning-based domain-adaptive language model to enhance the framework's ability to process and interpret specialized bug reports effectively. Second, we propose an active learning-based dynamic negative sampling strategy to address the scalability issues arising from the extensive function-level sample space. To evaluate the effectiveness of our approach, we extend and release a function-level bug localization dataset derived from large-scale real-world projects. Extensive experiments demonstrate that our approach outperforms state-of-the-art techniques.
MGSTDN: Multi-Granularity Spatial-Temporal Diffusion Network for Next POI Recommendation
Next Point-of-Interest (POI) prediction is important to various human mobility applications, such as route planning and location-based advertising. To address the spatial-temporal sparsity issues arising from users' irregular and inconsistent visit times to different POIs, multi-granular structures can be incorporated to enhance feature representation through hierarchical relationships. However, existing methods often fall short in capturing the comprehensive multi-granularity spatial-temporal correlations due to three primary limitations: (1) users' complex mobility patterns entangled in single trajectory data, (2) limited mobility patterns details due to independent modeling at each granularity, and (3) low inference efficiency in cascaded multi-granularity predictions. To tackle these challenges, we propose a novel approach that models transformations across different granularities in both spatial regions and temporal periods as a diffusion process, leading to the development of the Multi-Granularity Spatial-Temporal Diffusion Network (MGSTDN). In particular, this model adopts a multi-task architecture, where predictions at varying spatial-temporal granularities (i.e., different diffusion steps) are treated as distinct tasks. By employing a multi-granularity diffusion mechanism in both spatial and temporal dimensions, it captures more nuanced spatial-temporal correlations, enhancing the physical constraints and behavioral pattern dependencies across granularities. During the diffusion process's forward stage, coarser-grained regions and periods are derived based on fine-grained features. In the reverse stage, finer-grained regions and periods are recovered from coarse-grained features, guided by encoded historical trajectory information, until the next POI is determined. To improve computational efficiency, we introduce a multi-granularity mapping propagation matrix, enabling parallel computation and accelerating the prediction process across different granularities. We evaluated the effectiveness of MGSTDN through extensive experiments on three datasets, demonstrating significant improvements over existing methods.
Frequency-Decoupled Distillation for Efficient Multimodal Recommendation
Multimodal recommender systems (MMRec) leverage multimodal features, such as visual and textual data, to improve recommendation performance, playing a key role in platforms like online shopping and short videos. However, the large modality encoders and complex processing modules of MMRec significantly reduce its efficiency. A promising solution is compressing MMRec into an ID-based MLP model (MLPRec), which has a simpler structure and avoids complex modality handling. However, traditional knowledge distillation methods struggle to transfer knowledge effectively from MMRec to MLPRec, due to differences in their model structure and capacity. To address this, we propose a frequency-decoupled knowledge distillation framework-FDRec-to efficiently transfer knowledge from MMRec to MLPRec. By analyzing graph signals from a signal processing perspective, we propose decoupling the distillation process into low-frequency and high-frequency components, ensuring effective transmission of challenging high-frequency knowledge while preventing it from being overshadowed by monotonous low-frequency signals. To address the instability and fragmentation issues of KL divergence in traditional distillation approaches, we introduce the Wasserstein distance, which captures geometric structure and provides stable gradients. Additionally, FDRec incorporates an embedding-level contrastive learning method, further enhancing the transfer of refined knowledge from MMRec and injecting graph structure information into MLPRec for more effective distillation. Extensive experiments on four benchmark datasets and five popular MMRec models show that FDRec not only significantly reduces the computational costs and improves the inference efficiency, but also achieves comparable or even superior performance compared to MMRec. Our code is available at: https://github.com/Suehn/FDRec_
Vulnerability-Aware Hardening for Secure Privacy-Preserving Record Linkage
Privacy-Preserving Record Linkage (PPRL) aims to link records across multiple data sources without revealing any sensitive information about the entities whose records are being linked. However, recent studies have identified attacks that exploit multiple vulnerabilities in popular PPRL methods. To address such vulnerabilities and prevent possible reidentification, hardening techniques have been proposed to perturb patterns in encodings. Most such hardening techniques are either specific to bit array based encodings (such as Bloom filters), or they rely on randomness which can negatively affect linkage quality. Here we propose a novel hardening technique that addresses the frequency, similarity, and co-occurrence vulnerabilities, and is applicable on any PPRL method that uses character q-grams. Our technique identifies and hardens only those q-grams that are vulnerable, and modifies them using a non-random, context-aware approach that ensures these q-grams are not vulnerable after hardening. We evaluate our technique using real and synthetic data sets, and show that it substantially reduces the vulnerabilities of PPRL encoding methods and makes them more secure.
Relational Multi-Path Enhancement for Extrapolative Relation Reasoning in Temporal Knowledge Graph
Relation reasoning in temporal knowledge graph infers unknown or emerging relational dependencies from historical structured data. Traditional approaches face inherent limitations in capturing complex semantic correlations and structural patterns among relations. To tackle this problem, we propose the Relational Multi-path Enhancement network (RME), which primarily focuses on relation modeling to enrich relation representations through comprehensive multi-path analysis. RME consists of five key components: (1) Controlled random walk module creates multi-hop head-to-tail paths using an adaptive stopping rule that balances short- and long-term connections. (2) Shared path extraction module identifies both shared-head paths and shared-tail paths. (3) Time-decayed path encoding module processes these paths differently. (4) Gated information aggregation module combines path information to determine which parts matter most. (5) Attention decoding module makes the final prediction by focusing on the most relevant path features. Experiments on multiple TKG benchmark datasets demonstrate that RME outperforms the state-of-the-art methods in relation multi-path reasoning.
SESSION: Short Research Papers
Explicit Path CGR: Maintaining Sequence Fidelity in Geometric Representations
We present a novel information-preserving Chaos Game Representation (CGR) method, also called Reverse-CGR (R-CGR), for biological sequence analysis that addresses the fundamental limitation of traditional CGR approaches - the loss of sequence information during geometric mapping. Our method introduces complete sequence recovery through explicit path encoding combined with rational arithmetic precision control, enabling perfect sequence reconstruction from stored geometric traces. Unlike purely geometric approaches, our reversibility is achieved through comprehensive path storage that maintains both positional and character information at each step. We demonstrate the effectiveness of R-CGR on biological sequence classification tasks, achieving competitive performance compared to traditional sequence-based methods while providing interpretable geometric visualizations. The approach generates feature-rich images suitable for deep learning while maintaining complete sequence information through explicit encoding, opening new avenues for interpretable bioinformatics analysis where both accuracy and sequence recovery are essential.
Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
Despite recent advances, Large Language Models (LLMs) remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs. While prior research has proposed various attack strategies differing in human readability and transferability, little attention has been paid to the linguistic and psychological mechanisms that may influence a model's susceptibility to such attacks. In this paper, we examine an interdisciplinary line of research that leverages foundational theories of persuasion from the social sciences to craft adversarial prompts capable of circumventing alignment constraints in LLMs. Drawing on well-established persuasive strategies, we hypothesize that LLMs, having been trained on large-scale human-generated text, may respond more compliantly to prompts with persuasive structures. Furthermore, we investigate whether LLMs themselves exhibit distinct persuasive fingerprints that emerge in their jailbreak responses. Empirical evaluations across multiple aligned LLMs reveal that persuasion-aware prompts significantly bypass safeguards, demonstrating their potential to induce jailbreak behaviors. This work underscores the importance of cross-disciplinary insight in addressing the evolving challenges of LLM safety. The code and data are available. https://github.com/CyberScienceLab/Our-Papers/tree/main/PersuasiveJailbreaking/.
Compressed Concatenation of Small Embedding Models
Embedding models are central to dense retrieval, semantic search, and recommendation systems, but their size often makes them impractical to deploy in resource-constrained environments such as browsers or edge devices. While smaller embedding models offer practical advantages, they typically underperform compared to their larger counterparts. To bridge this gap, we demonstrate that concatenating the raw embedding vectors of multiple small models can outperform a single larger baseline on standard retrieval benchmarks. To overcome the resulting high dimensionality of naive concatenation, we introduce a lightweight unified decoder trained with a Matryoshka Representation Learning (MRL) loss. This decoder maps the high-dimensional joint representation to a low-dimensional space, preserving most of the original performance without fine-tuning the base models. We also show that while concatenating more base models yields diminishing gains, the robustness of the decoder's representation under compression and quantization improves. Our experiments show that, on a subset of MTEB retrieval tasks, our concat-encode-quantize pipeline recovers 89% of the original performance with a 48× compression factor when the pipeline is applied to a concatenation of four small embedding models.
Enhancing Fake News Video Detection via LLM-Driven Creative Process Simulation
The emergence of fake news on short video platforms has become a new significant societal concern, necessitating automatic video-news-specific detection. Current detectors primarily rely on pattern-based features to separate fake news videos from real ones. However, limited and less diversified training data lead to biased patterns and hinder their performance. This weakness stems from the complex many-to-many relationships between video material segments and fabricated news events in real-world scenarios: a single video clip can be utilized in multiple ways to create different fake narratives, while a single fabricated event often combines multiple distinct video segments. However, existing datasets do not adequately reflect such relationships due to the difficulty of collecting and annotating large-scale real-world data, resulting in sparse coverage and non-comprehensive learning of the characteristics of potential fake news video creation. To address this issue, we propose a data augmentation framework AgentAug that generates diverse fake news videos by simulating typical creative processes. AgentAug implements multiple LLM-driven pipelines of four fabrication categories for news video creation, combined with an active learning strategy based on uncertainty sampling to select the potentially useful augmented samples during training. Experimental results on two benchmark datasets demonstrate that AgentAug consistently improves the performance of short video fake news detectors.
State & Geopolitical Censorship on Twitter (X): Detection & Impact Analysis of Withheld Content
State and geopolitical censorship on Twitter, now X, has been turning into a routine, raising concerns about the boundaries between criminal content and freedom of speech. One such censorship practice, withholding content in a particular state has renewed attention due to Elon Musk's apparent willingness to comply with state demands. In this study, we present the first quantitative analysis of the impact of state censorship by withholding on social media using a dataset in which two prominent patterns emerged: Russian accounts censored in the EU for spreading state-sponsored narratives, and Turkish accounts blocked within Turkey for promoting militant propaganda. We find that censorship has little impact on posting frequency but significantly reduces likes and retweets by 25%, and follower growth by 90%-especially when the censored region aligns with the account's primary audience. Meanwhile, some Russian accounts continue to experience growth as their audience is outside the withholding jurisdictions. We develop a user-level binary classifier with a transformer backbone and temporal aggregation strategies, aiming to predict whether an account is likely to be withheld. Through an ablation study, we find that tweet content is the primary signal in predicting censorship, while tweet metadata and profile features contribute marginally. Our best model achieves an F1 score of 0.73 and an AUC of 0.83. This work informs debates on platform governance, free speech, and digital repression.
T-Retrievability: A Topic-Focused Approach to Measure Fair Document Exposure in Information Retrieval
Retrievability of a document is a collection-based statistic that measures its expected (reciprocal) rank of being retrieved within a specific rank cut-off. A collection with uniformly distributed retrievability scores across documents is an indicator of fair document exposure. While retrievability scores have been used to quantify the fairness of exposure for a collection, in our work, we use the distribution of retrievability scores to measure the exposure bias of retrieval models. We hypothesise that an uneven distribution of retrievability scores across the entire collection may not accurately reflect exposure bias but rather indicate variations in topical relevance. As a solution, we propose a topic-focused localised retrievability measure, which we call T-Retrievability (topic-retrievability), which first computes retrievability scores over multiple groups of topically-related documents, and then aggregates these localised values to obtain the collection-level statistics. Our analysis using this proposed T-Retrievability measure uncovers new insights into the exposure characteristics of various neural ranking models. The findings suggest that this localised measure provides a more nuanced understanding of exposure fairness, offering a more reliable approach for assessing document accessibility in IR systems.
Pruning Strategies for Backdoor Defense in LLMs
Backdoor attacks are a significant threat to the performance and integrity of pre-trained language models. Although such models are routinely fine-tuned for downstream NLP tasks, recent work shows they remain vulnerable to backdoor attacks that survive vanilla fine-tuning. These attacks are difficult to defend because end users typically lack knowledge of the attack triggers. Such attacks consist of stealthy malicious triggers introduced through subtle syntactic or stylistic manipulations, which can bypass traditional detection and remain in the model, making post-hoc purification essential. In this study, we explore whether attention-head pruning can mitigate these threats without any knowledge of the trigger or access to a clean reference model. To this end, we design and implement six pruning-based strategies: (i) gradient-based pruning, (ii) layer-wise variance pruning, (iii) gradient-based pruning with structured L1/L2 sparsification, (iv) randomized ensemble pruning, (v) reinforcement-learning-guided pruning, and (vi) Bayesian uncertainty pruning. Each method iteratively removes the least informative heads while monitoring validation accuracy to avoid over-pruning. Experimental evaluation shows that gradient-based pruning performs best while defending the syntactic triggers, whereas reinforcement learning and Bayesian pruning better withstand stylistic attacks.
More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models
Large Language Models (LLMs) have revolutionized natural language processing, yet concerns persist regarding their tendency to reflect or amplify social biases. This study introduces a novel evaluation framework to uncover gender biases in LLMs: using free-form storytelling to surface biases embedded within the models. A systematic analysis of ten prominent LLMs shows a consistent pattern of overrepresenting female characters across occupations, likely due to supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). Paradoxically, despite this overrepresentation, the occupational gender distributions produced by these LLMs align more closely with human stereotypes than with real-world labor data. This highlights the challenge and importance of implementing balanced mitigation measures to promote fairness and prevent the establishment of potentially new biases. We release the prompts and LLM-generated stories at GitHub.
Improving Graph Autoencoders by Hard Sample Refinement with Global Similarity
Masked graph autoencoders (GAEs) have attracted significant attention in recent years. GAEs typically leverage graph neural networks to reconstruct topological properties and node features. However, existing feature-based GAEs face performance bottlenecks, particularly on hard-to-reconstruct nodes, due to their excessive reliance on local aggregation. To address this limitation, we propose a novel framework, Global-Similarity-Enhanced Graph Autoencoder (GSE-GAE). GSE-GAE adopts a knowledge distillation strategy within a self-supervised teacher-student architecture. Specifically, a teacher module integrates raw features and topology with long-range structural augmentations for hard nodes, while a representation alignment loss ensures effective transfer of global knowledge to the student model. Extensive experiments demonstrate the superiority of GSE-GAE, providing new insights into improving performance.
Multimodal Contrastive Learning with Early Fusion for Robust Medical Signal Representation
Contrastive learning has achieved remarkable success in the representation learning of physiological signals foundation models. However, current approaches often focus on unimodal settings or treat each modality independently, neglecting the rich synergistic information that can emerge from cross-modal interactions. This leads to suboptimal representations that fail to capture complex interdependencies across modalities. To address this limitation, we propose a multimodal contrastive learning framework that aligns fused representations instead of individual signals. Specifically, we encode ECG Lead II, ECG Lead V, and PPG signals using modality-specific encoders, followed by a fusion block to integrate modality-specific embeddings into a unified representation. By applying contrastive learning to the unified representation, our approach effectively mitigates inter-modal conflicts while capturing complementary cross-modal features that would otherwise be lost in traditional alignment strategies. Experiments on MIMIC-III (internal) and VitalDB (external) datasets demonstrate that our approach outperforms existing baselines in patient attribute prediction tasks, validating its effectiveness in learning comprehensive multimodal representations of physiological signals.
G2IFS: Global-to-Instance Feature Selection in Deep Recommender System
Feature selection plays a vital role in recommender systems by identifying informative features for accurate prediction. While adaptive methods like AdaFS select instance-wise features based on sample variability, they often overlook globally important features and suffer from limited transferability. To address these limitations, we propose G2IFS (Global-to-Instance Feature Selection), a novel framework that integrates global distributional patterns with instance-level adaptation for more robust and generalizable feature selection. G2IFS consists of an online statistics module, a main scoring network, and a non-parametric Gaussian mixture module. The online statistics module maintains global estimates of class-wise statistics to compute Fisher scores, which guide the scoring network in learning instance-specific feature importance. The Gaussian module further mitigates co-adaptation and improves transferability. Extensive experiments across diverse recommendation models and real-world datasets show that G2IFS consistently outperforms state-of-the-art baselines in terms of accuracy, efficiency, and transferability. In-depth analysis further reveals that various global importance signals-when integrated into traditional methods like AdaFS-consistently lead to significant performance improvements, underscoring the general effectiveness of combining global and instance-level signals in recommender system feature selection. The code is available at https://github.com/youlj109/G2IFS.
VQA-Induct: Instruction Induction for Visual Question Answering
Multimodal Large Language Models (MLLMs) have shown strong capabilities in Visual Question Answering (VQA) tasks. However, current approaches for enhancing VQA reasoning performance often assume access to extensive resources such as large annotated datasets, external tools, or numerous demonstrations, which are impractical for real-world users who typically possess only a few demonstrations. We present VQA-Induct, a framework for data-scarce scenarios that leverages MLLMs' instruction induction capabilities to induce reusable, purely textual task-level instructions from as few as three demonstrations of the same task, then applies these instructions to new instances using only their image-question pairs. Comprehensive experiments on PuzzleVQA and AlgoPuzzleVQA across diverse MLLMs demonstrate that our method outperforms state-of-the-art methods without requiring demonstrations at inference time. Furthermore, instructions induced by stronger models effectively boost the performance of smaller models, enabling cost-efficient reasoning at inference time.
Arrows of Math Reasoning Data Synthesis for Large Language Models: Diversity, Complexity and Correctness
Enhancing the mathematical reasoning of large language models (LLMs) demands high-quality training data, yet conventional methods face critical challenges in scalability, cost, and data reliability. To address these limitations, we propose a novel program-assisted synthesis framework that systematically generates a high-quality mathematical corpus with guaranteed diversity, complexity, and correctness. This framework integrates mathematical knowledge systems and domain-specific tools to create executable programs. These programs are then translated into natural language problem-solution pairs and vetted by a bilateral validation mechanism that verifies solution correctness against program outputs and ensures program-problem consistency. We have generated 12.3 million such problem-solving triples. Experiments demonstrate that models fine-tuned on our data significantly improve their inference capabilities, achieving state-of-the-art performance on several benchmark datasets and showcasing the effectiveness of our synthesis approach.
Toward Secure Federated Partial Label Learning Against Poisoning Attacks
How to defend against attacks in Federated Partial Label Learning (FedPLL) is a brand new and challenging question in machine learning security due to stealthy and efficient attack behaviors of adversaries. In this paper, we systematically study this problem by developing an Adaptive Partial Label Attack (APLA) which subtly manipulates the candidate label set of the data sample. To defend against APLA, we develop the RobustFedPLL framework incorporating three modules: (1) in preliminary clustering, we implement a Gaussian Mixture Model (GMM) and a moving average mechanism to identify clients' confidence; (2) in representation contrasting, we develop a contrast-based algorithm to obtain clients' model feature representations; (3) in final clustering, we utilize mainstream clustering algorithms to finally distinguish adversaries. Experiments of RobustFedPLL and SOTA defense algorithms based on two datasets are conducted, demonstrating the superiority of RobustFedPLL under various experimental settings.
Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting
Time series (TS) data are ubiquitous across various application areas, rendering time series forecasting (TSF) a fundamental task. With the astounding advances in large language models (LLMs), a variety of methods have been developed to adapt LLMs for time series forecasting. Despite unlocking the potential of LLMs in comprehending TS data, existing methods are inherently constrained by their shallow integration of TS information, wherein LLMs typically access TS representations at shallow layers, primarily at the input layer. This causes the influence of TS representations to progressively fade in deeper layers and eventually leads to ineffective adaptation between textual embeddings and TS representations. In this paper, we propose the Multi-layer Steerable Embedding Fusion (MSEF), a novel framework that enables LLMs to directly access time series patterns at all depths, thereby mitigating the progressive loss of TS information in deeper layers. Specifically, MSEF leverages off-the-shelf time series foundation models to extract semantically rich embeddings, which are fused with intermediate text representations across LLM layers via layer-specific steering vectors. These steering vectors are designed to continuously optimize the alignment between time series and textual modalities and facilitate a layer-specific adaptation mechanism that ensures efficient few-shot learning capabilities. Experimental results on seven benchmarks demonstrate significant performance improvements by MSEF compared with baselines, with an average reduction of 31.8% in terms of MSE. The code is available at https://github.com/One1sAll/MSEF.
FUTURE: Flexible Unlearning for Tree Ensemble
Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the right to be forgotten, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.
Contrastive ECOC: Learning Output Codes for Adversarial Defense
Although one-hot encoding is commonly used for multiclass classification, it is not always the most effective encoding mechanism. Error Correcting Output Codes (ECOC) address multiclass classification by mapping each class to a unique codeword used as a label. Traditional ECOC methods rely on manually designed or randomly generated codebooks, which are labor-intensive and may yield suboptimal, dataset-agnostic results. This paper introduces three models for automated codebook learning based on contrastive learning, allowing codebooks to be learned directly and adaptively from data. Across four datasets, our proposed models demonstrate superior robustness to adversarial attacks compared to two baselines. The source is available at GitHub.
MU-OT: Effective and Unified Machine Unlearning with Optimal Transport for Feature Realignment
Machine unlearning has emerged as a significant research topic in response to the increasing demands for data privacy and compliance with privacy regulations. The main challenge is to eliminate the influence of a specific subset of training data from a pretrained model while preserving the model's performance on the retain set without retraining the model from scratch. In this paper, we propose a novel efficient unlearning framework based on Optimal Transport, which can effectively work on both class-wise and instance-wise unlearning tasks. By analyzing and comparing the feature spaces of the original and retrained models, we formulate the unlearning problem as a distribution alignment task between the forget set and the retain set. We guide the feature distribution of the forget set, which initially forms distinct and structured patterns, to align with that of the retain set. Extensive experiments on three public benchmark datasets demonstrate its superior effectiveness compared to previous state-of-the-art methods.
H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
Hotword customization is crucial in ASR to enhance the accuracy of domain-specific terms. It has been primarily driven by the advancements in traditional models and Audio large language models (LLMs). However, existing models often struggle with large-scale hotwords, as the recognition rate drops dramatically with the number of hotwords increasing. In this paper, we introduce a novel hotword customization system that utilizes a hotword pre-retrieval module (H-PRM) to identify the most relevant hotword candidate by measuring the acoustic similarity between the hotwords and the speech segment. This plug-and-play solution can be easily integrated into traditional models such as SeACo-Paraformer, significantly enhancing hotwords post-recall rate (PRR). Additionally, we incorporate H-PRM into Audio LLMs through a prompt-based approach, enabling seamless customization of hotwords. Extensive testing validates that H-PRM can outperform existing methods, showing a new direction for hotword customization in ASR.
DP-COMET: A Differential Privacy Contextual Obfuscation MEchanism for Texts in Natural Language Processing
Protecting sensitive information within textual data strongly depends on the context in which the data is presented. However, current privacy-preserving obfuscation mechanisms based on epsilon-Differential Privacy (DP) produce an obfuscated private text, changing the original phrase term-by-term without considering the context in which such a term is placed. This paper introduces DP-COMET, an epsilon-DP obfuscation mechanism that evaluates a text's context before producing its private version. The mechanism defines a representation of the original text that considers the entire context within the text, producing an obfuscated version after adding noise to this representation and depending on the privacy parameter epsilon. We test DP-COMET on different Natural Language Processing (NLP) and Information Retrieval (IR) downstream tasks, and our findings show that our obfuscation mechanism not only achieves comparable performance results to traditional term-by-term mechanisms but also produces obfuscated texts less similar to the originals. To promote the reproducibility of DP-COMET, we make the code publicly available at https://github.com/Kekkodf/DP-COMET.
+VeriRel: Verification Feedback to Enhance Document Retrieval for Scientific Fact Checking
Identification of appropriate supporting evidence is critical to the success of scientific fact checking. However, existing approaches rely on off-the-shelf Information Retrieval algorithms that rank documents based on relevance rather than the evidence they provide to support or refute the claim being checked. This paper proposes +VeriRel which includes verification success in the document ranking. Experimental results on three scientific fact checking datasets (SciFact, SciFact-Open and Check-Covid) demonstrate consistently leading performance by +VeriRel for document evidence retrieval and a positive impact on downstream verification. This study highlights the potential of integrating verification feedback to document relevance assessment for effective scientific fact checking systems. It shows promising future work to evaluate fine-grained relevance when examining complex documents for advanced scientific fact checking.
POLAR: Policy Optimization for Literature Analysis under Review Constraints
Systematic reviews are vital for evidence-based decision-making but remain resource-intensive due to the volume of literature requiring expert screening. Technology-Assisted Review (TAR) systems offer a solution by ranking documents for review, yet questions remain about how best to allocate limited human effort across multiple review topics. In this paper, we explore the problem of effort distribution by comparing alternative screening policies under fixed effort constraints. Using real-world data from the CLEF eHealth 2017-2019 TAR tasks, we evaluate both baseline and adaptive policies that account for topic size, screening depth, and residual uncertainty. We introduce effort-aware evaluation metrics to measure trade-offs between review effectiveness and resource use. Our results show that simple, topic-sensitive policies can significantly improve the yield of relevant documents discovered, offering practical insights for scalable and equitable systematic review workflows.
AI on the Pulse: Real-Time Health Anomaly Detection with Wearable and Ambient Intelligence
We introduce AI on the Pulse, a real-world-ready anomaly detection system that continuously monitors patients using a fusion of wearable sensors, ambient intelligence, and advanced AI models. Powered by UniTS, a state-of-the-art (SoTA) universal time-series model, our framework autonomously learns each patient's unique physiological and behavioral patterns, detecting subtle deviations that signal potential health risks. Unlike classification methods that require impractical, continuous labeling in real-world scenarios, our approach uses anomaly detection to provide real-time, personalized alerts for reactive home-care interventions. Our approach outperforms 12 SoTA anomaly detection methods, demonstrating robustness across both high-fidelity medical devices (ECG) and consumer wearables, with a ~22% improvement in F1 score. However, the true impact of AI on the Pulse lies in @HOME, where it has been successfully deployed for continuous, real-world patient monitoring. By operating with non-invasive, lightweight devices like smartwatches, our system proves that high-quality health monitoring is possible without clinical-grade equipment. Beyond detection, we enhance interpretability by integrating LLMs, translating anomaly scores into clinically meaningful insights for healthcare professionals.
Think it Image by Image: Multi-Image Moral Reasoning of Large Vision-Language Models
Vision Language Models (VLMs) have demonstrated remarkable success in downstream applications, yet they often exhibit biases, raising ethical concerns. While previous efforts have aimed to evaluate and improve the moral reasoning capabilities of VLMs, existing approaches are limited by simplified, unimodal settings or overly static visual scenarios. We propose a novel multi-image-based dataset pipeline MIST (Moral Inference through Storytelling with Text and Images) designed to assess moral reasoning in complex, dynamic scenarios to address these limitations. To ensure better alignment between these modalities, we introduce the concept of ''text-image flow,'' which seamlessly integrates visual and textual information across complex scenarios. Using this dataset, we evaluate seven widely used VLMs, offering critical insights into their performance in moral reasoning tasks.
LLM-OFA: On-the-Fly Adaptation of Large Language Models to Address Temporal Drift Across Two Decades of News
We investigate the problem of on-the-fly adaptation (OFA) with online feedback for large language models (LLMs) in the context of temporally evolving data. In this setting, each incoming instance-or a small batch- is first processed for inference, and its true label is revealed immediately after prediction, allowing the model to be updated in a sequential, single-pass manner. While pre-trained LLMs achieve state-of-the-art results across NLP tasks, they often struggle to generalize under dynamic distribution shifts-particularly in continuously evolving environments. Despite the importance of this problem, existing research on online adaptation of LLMs remains limited, and there is a lack of large-scale benchmarks for evaluating such methods. To address these gaps, we introduce 1M-News, a large-scale benchmark of one million New York Times headlines spanning two decades, and benchmark six state-of-the-art LLMs by fine-tuning them on the first 10 years and applying OFA on the following 10 years. To improve adaptation performance, we develop Adaptimizer, the first optimizer specifically designed for OFA, enabling rapid and stable model updates under temporal distribution shift. Adaptimizer maintains two sets of weights-fast and slow-balancing rapid adaptation with long-term stability and generalization across the stream. Our experiments demonstrate that OFA with Adaptimizer achieves consistent improvements over static baselines. All code and data are publicly available at https://github.com/pouyaghahramanian/LLM-OFA.
Approximating Gradient-Based Influence for Scalable Instruction Data Selection
Instruction Tuning (IT) is crucial for enhancing Large Language Models (LLMs), but training on all available instructions is often unnecessary and computationally costly. Recent studies show that small, well-chosen subsets can match or exceed full dataset performance, motivating efficient data selection techniques. While gradient-based methods like LESS estimate sample influence effectively, they are expensive due to per-sample gradient computation. We propose Approx-LESS, a scalable alternative that computes LoRA-based gradient features for a small fraction of the samples and trains regression models to predict influence scores for the rest. This enables selection and tuning on the most impactful samples. On three validation sets with a fixed 270K instruction corpus, Approx-LESS outperforms applicable baselines and closely matches LESS, reducing gradient extraction time by over 3x. It also shows high sample selection overlap with LESS, making it an effective, low-cost method for influence-based instruction tuning.
Time-Period-Aware Embedding Regeneration for Session-Based Recommendation
Session-based recommender systems typically focus on intra-session user behavior but often overlook the macro-level temporal evolution of items themselves. To address this gap, we introduce a model that explicitly captures item dynamics by regenerating time-period-aware embeddings. Our approach partitions the data timeline into several distinct periods and employs a simple yet effective module, TEG, which uses a GRU and a causal attention layer to recurrently learn how item representations evolve from one period to the next. This allows our model to capture global, long-term trends while its gating mechanism naturally accommodates both dynamic and static items. Extensive experiments on three real-world datasets show that our method achieves highly competitive performance against complex state-of-the-art models. More importantly, it demonstrates the significant and complementary value of modeling global item evolution, providing a new dimension for improving session-based recommendation.
Pseudo-Inverse Prefix Tuning for Effective Unlearning in LLMs
Large Language Models (LLMs) are widely used in many real-world applications, but their deployment raises concerns about data privacy and compliance with regulations such as the right to be forgotten. To address these challenges, we explore the problem of machine unlearning, selectively removing the influence of specific training data from a model. While many existing approaches require retraining the entire model and access to both forget data and retain data, we propose Pseudo-Inverse Prefix Tuning (PI-Prefix), a parameter-efficient fine-tuning method that enables targeted forgetting with minimal overhead. PI-Prefix learns a small set of prefix parameters on the data to be forgotten and then applies pseudo-inverse transformation to unlearn the forget data while maintaining performance on retain data. Our experiments on two sentiment classification tasks (SST-2 and Yelp) demonstrate that PI-Prefix achieves effective and interpretable forgetting, with forget-set performance approaching random prediction. It preserves a strong generalization on the retain set even without requiring it during unlearning. These results highlight PI-Prefix as a promising direction for scalable and compliant unlearning in data removal contexts.
Study on LLMs for Promptagator-Style Dense Retriever Training
Promptagator demonstrated that Large Language Models (LLMs) with few-shot prompts can be used as task-specific query generators for fine-tuning domain-specialized dense retrieval models. However, the original Promptagator approach relied on proprietary and large-scale LLMs which users may not have access to or may be prohibited from using with sensitive data. In this work, we study the impact of open-source LLMs at accessible scales (≤14B parameters) as an alternative. Our results demonstrate that open-source LLMs as small as 3B parameters can serve as effective Promptagator-style query generators. We hope our work will inform practitioners with reliable alternatives for synthetic data generation and give insights to maximize fine-tuning results for domain-specific applications. Our code is available at https://www.github.com/mitll/promptodile
DocPolicyKG: A Lightweight LLM-Based Framework for Knowledge Graph Construction from Chinese Policy Documents
Chinese policy documents are typically written in a concise yet contextually rich style, containing implicit hierarchical logic and strategic intent. These linguistic and structural features present challenges for traditional information extraction methods, which often struggle with cross-sentence dependencies and semantic complexity. To address these, we propose DocPolicyKG, a novel framework for constructing knowledge graphs from Chinese policy documents using lightweight large language models (LLMs), integrating domain ontology, fine-tuning, and prompt engineering. Focusing on the investment promotion policies in China, experimental results demonstrate that DocPolicyKG significantly outperforms the base model Deepseek-R1-7B and achieves performance competitive with GPT-4o on both named entity recognition (NER) and relation triplet extraction (RTE) tasks. Building on DocPolicyKG, we construct the first large-scale knowledge graph of Chinese investment promotion policies, and further integrate Graph Retrieval-Augmented Generation (Graph RAG) to support policy question answering through entity-relation reasoning and semantic retrieval. The graph data is publicly available at: github.com/hanshenmesen/DocPolicyKG.
RAG-based Unanswerable Question Detection in Clinical Text-to-SQL
Large-scale language models (LLMs) have shown exceptional performance in various tasks, particularly in zero-shot and few-shot settings. However, in sensitive domains like healthcare, detecting unanswerable questions remains a critical challenge. This task is challenging due to data imbalance, and existing methods are computationally expensive and inflexible to data distribution changes. To address these issues, we propose Retrieval-augmented Question Answerability Detection (RaQAD), a training-free method that uses LLMs to identify unanswerable questions by retrieving semantically similar examples as few-shot prompts. RaQAD ensures semantically similar sampling, adapts to schema changes, and eliminates the need for additional training. Extensive experiments on clinical datasets demonstrate its effectiveness in outperforming existing approaches while addressing data imbalance challenges.
From Bicliques to BiFlexi Cliques: A New Era of Bipartite Subgraph Discovery
Real-world bipartite communities tend to exhibit relaxed internal connectivity as their size increases, making traditional biclique models too restrictive for cohesive subgraph discovery. In this paper, we propose the Biflexi, a novel bipartite subgraph model that employs flexible, size-adaptive degree thresholds based on sublinear constraints. Our approach dynamically adjusts connectivity requirements according to subgraph size, enabling the discovery of larger and more realistic cohesive structures. We prove that the Maximum Biflexi problem is NP-hard and develop an efficient heuristic algorithm. Experimental results on real-world datasets demonstrate the effectiveness and scalability of our algorithm and the applicability of our model.
Mixture-of-KAN for Multivariate Time Series Forecasting
Multivariate time series forecasting is a crucial task that predicts the future states based on historical inputs. Although current deep learning-based methods have made significant advancements, they still face the criticism of lacking interpretability. The rise of the Kolmogorov-Arnold Network (KAN) provides a new perspective to implement an efficient and interpretable deep learning-based method for forecasting time series. However, we find there are two main challenges in the application of KAN in time series forecasting: how to select the appropriate one from various KAN variants and how to train the deep KAN-based network. To this end, we propose the multi-layer mixture-of-KAN network, which achieves excellent performance while retaining KAN's ability to be transformed into a combination of symbolic functions. The core module is the mixture-of-KAN layer, which uses a mixture-of-experts structure to assign variables to best-matched KAN experts. Then, we analyze the shortcomings of parameter initialization in the original KAN and provide an effective initialization method to alleviate training instability. Extensive experimental results demonstrate that our proposed method is effective in multivariate time series forecasting. Codes are released in https://github.com/2448845600/EasyTSF.
Information Diffusion Prediction Based on User Multi-Dimensional Feature Interaction
Information diffusion prediction, the forecasting of propagation paths, provides critical insights into information spread mechanisms, directly enabling applications like misinformation spread forecasting and detection for malicious account. Prior research primarily focused on combining user social graphs and information cascades for prediction, often overlooking the distinct role characteristics users exhibit during interactions. Classifying users into different roles enables the construction of a multi-layered social graph, facilitating the extraction of deeper user features. This paper introduces a model that leverages multi-dimensional interactions between user features. Specifically, to account for users' dynamic preferences, we construct sequential hypergraphs from information cascades using timestamps and utilize a hypergraph neural network to extract users' dynamic features. Furthermore, to capture users' static features, we build multi-layer social networks from the social graph based on users' roles. We employ graph convolutional networks to separately extract static features from each layer and subsequently fuse them using an attention mechanism. Superior performance of our framework is evidenced by experimental validation on real-world datasets against cutting-edge benchmarks.
Sparse Autoencoders in Collaborative Filtering Enhanced LLM-based Recommender Systems
Large language models (LLM) have demonstrated remarkable capability in recommendation tasks. Recently, efforts have been made to further enhance LLM performance with collaborative knowledge learned from traditional recommender systems. One approach is to inject learned embeddings into LLM prompts through a trainable projector, yet these embeddings could carry noisy or irrelevant information. In this paper, we propose using sparse autoencoders to improve input prompts. We show that sparse autoencoders can learn highly interpretable embeddings and extract key collaborative features in the case of recommender systems. With the help of sparse autoencoders, we are able to extract collaborative features to augment input prompts. By capturing TopK features of each item, we mitigate noisy information from item embeddings, therefore sparse autoencoders can also help with denoising embeddings in prompts. We develop two methods that utilize sparse autoencoders to augment or denoise input prompts. We evaluate the proposed methods on three real-world datasets and both show promising performance improvements.
Assessing Natural Language Explanations of Relational Graph Neural Networks
Although relational graph neural networks (RGNNs) excel at learning from graph-structured data as it appears in knowledge graphs, they often lack interpretability. While natural language (NL) explanations offer a promising solution, evaluating these explanations remains largely unaddressed. There is no unified evaluation framework including evaluation metrics, benchmarking datasets, and established evaluation procedures. This paper introduces NLEF, a novel NL Evaluation Framework to assess the quality of NL explanations for RGNNs. It uses the NL explanations to make new predictions and assesses in how far they align with the predictions produced by the RGNN. Towards this end, we propose two methods: (1) we convert NL explanations to description logics (DL) and use a DL reasoner for node classifications, (2) we use a retrieval augmented approach (RAG) for node classifications. Our evaluation results show that our DL method is highly scalable, whereas the RAG approach often yields the highest performance.
Structuring Data Science Automation: A Competency-Aware Taxonomy Approach
The growing number of data science (DS) automation frameworks complicates the selection of suitable tools for project-specific tasks. Current overviews emphasize functional capabilities or pipeline coverage, yet often overlook how tools align with DS workflows and user competencies. This paper introduces a first approach toward a competency-focused taxonomy of DS automation tools. Based on a structured literature review, we identify which CRISP-DM tasks are automated, how automation is achieved, and which DS competencies are required for effective use. We classify tools across four dimensions: CRISP-DM stage, task function, degree of automation, and required competencies. This taxonomy enables practitioners to match tools with workflow needs and team skill sets, while also clarifying human-in-the-loop dependencies. We validate the taxonomy through expert interviews and case mappings, demonstrating its practical value in identifying competency gaps and guiding framework adoption.
Adaptive Spike Neural Networks for Natural Language Inference Tasks with Dynamic Spike Predictor
Spike Neural Networks offer energy efficiency and are promising candidates for ultra-low-power inference on neuromorphic hardware. While extensively studied in computer vision, their application in Natural Language Processing remains limited and underexplored. Three significant challenges of the existing work are as follows: (1) spike firing functions are sensitive to initial conditions, (2) spike timings are stochastic even for identical token inputs, preventing the stable preservation of contextual information, and (3) the analysis of spike occurrences on learning effectiveness is limited. To improve learning efficiency and stability, we propose Dynamic Spike Predictor (DSP) that adaptively regulates spike generation. DSP predicts a scale-adjusted input current at each time step to regulate spike activity, maintaining stable gradient flow, with only about 0.2% additional parameters to the backbone SNNs. We validate its effectiveness through comprehensive experiments on three NLI benchmarks (CB, RTE, and SICK), addressing research questions on the learning performance, robustness, and extensibility of DSP. The code is available at https://github.com/bigbases/Spike-Predictor.
Context-Aware Fine-Grained Graph RAG for Query-Focused Summarization
Retrieval-Augmented Generation (RAG) enables large language models to provide more precise and pertinent responses by incorporating external knowledge. In the Query-Focused Summarization (QFS) task, GraphRAG-based approaches have notably enhanced the comprehensiveness and diversity of generated responses. However, existing GraphRAG-based approaches lack sufficient fine-grained contextual information during graph retrieval, resulting in LLMs being unable to accurately understand the detailed and specific background knowledge of a query. To address it, we propose Context-Aware Fine-Grained Graph RAG (FG-RAG). On the one hand, FG-RAG employs Context-Aware Entity Expansion in graph retrieval to provide more contextual information for the retrieved content. On the other hand, FG-RAG utilizes Query-Level Fine-Grained Summarization to incorporate fine-grained details during response generation, enhancing query awareness for the generated summarization. Our evaluation demonstrates that FG-RAG outperforms other RAG systems in multiple metrics of comprehensiveness, diversity, and empowerment when handling the QFS task. Our implementation is available at https://github.com/BuptWululu/FG-RAG.
Spatio-Temporal Residual Masked Autoencoder for Urban Rent Estimation
Housing affordability has become a critical issue in many cities, but gaps in rental transaction records hinder accurate rent estimation. Conventional spatial interpolation and time-series regression methods either ignore temporal trends or over-smooth spatial variation, leading to biased rent estimates in under-reported areas. While recent masked autoencoding techniques for tabular imputation address feature-wise missingness, they do not explicitly model the joint spatio-temporal structure of urban rent dynamics. This paper proposes a Spatio-Temporal Residual Masked Autoencoder (ST-ResMAE) that reconstructs masked rental values by integrating continuous covariates with learnable spatio-temporal embeddings and by modelling residuals over both space and time. Case study on the selected Australian urban suburbs rental data from 2020 to 2024 show that ST-ResMAE reduces imputation error by 5% relative to recent masked-autoencoding methods and by 15% relative to traditional regression models. These results demonstrate ST-ResMAE's ability to capture complex spatio-temporal rent dynamics even when data are sparse.
XDNet: Disentangled Time Series Forecasting via Exponential Decomposition and 2D Periodic Modeling
In time series analysis, disentangling long-term trends and seasonal patterns is crucial for capturing multi-scale temporal structures and improving both interpretability and forecasting accuracy. Recently, 2D modeling techniques have been incorporated into multivariate forecasting frameworks to better exploit periodic patterns. However, conventional decomposition methods often rely on simplistic moving averages that obscure critical patterns, while 2D modeling may entangle global trends with local variations and fail to normalize seasonal amplitudes-ultimately impairing both interpretability and forecast accuracy. To overcome these limitations, we propose XDNet (Exponential-Dimensional Network), a principled forecasting framework that explicitly disentangles trend and seasonal dynamics. At its core lies the Exponentially Weighted Decomposition (XWD), which applies decaying weights to past observations to preserve the integrity of long-term trends while adaptively normalizing seasonal fluctuations. The trend component is modeled using Temporal Kolmogorov-Arnold Networks (KAN) to capture intricate nonlinear dynamics, while the seasonal component is processed through a refined Inception-based module that robustly extracts fine-grained periodic dependencies. Extensive experiments on multiple benchmark datasets demonstrate that XDNet achieves state-of-the-art forecasting performance, delivering up to a 2.79% improvement in average accuracy over leading baselines, particularly in long-horizon prediction tasks.
Jailbreaking LLMs Through Alignment Vulnerabilities in Out-of-Distribution Settings
Recently, Large Language Models (LLMs) have shown remarkable capabilities, but concerns about their trustworthiness-especially under ''jailbreaking'' attacks-remain unresolved. Prior work often assumes white-box access or relies on fixed prompt templates, limiting practicality. We propose ObscurePrompt, a simple yet effective black-box jailbreak method inspired by fragile LLM alignment on Out-of-Distribution (OOD) inputs. ObscurePrompt constructs base prompts using existing jailbreak techniques, then employs powerful LLMs to iteratively generate obscure variants that evade detection. Extensive experiments demonstrate that ObscurePrompt outperforms existing methods and remains effective against two widely-used defenses.
FASE: Feature-Aligned Scene Encoding for Open-Vocabulary Object Detection in Remote Sensing
Open-vocabulary object detection (OVD) in remote sensing (RS) has shown remarkable generalization capabilities across diverse RS imagery through alignment between image and text embeddings. Such methods have further improved detection performance by incorporating additional scene-level context from both visual and textual domains. However, existing methods approximate scene context by simply averaging the text embeddings of the image's object labels, which is insufficient to capture the rich linguistic context present in RS scenes. To address this limitation, we propose a novel Feature-Aligned Scene Encoding (FASE), which constructs comprehensive scene representations through high-quality captions generated by a specialized vision-language model. Our Feature Alignment Module (FAM) creates a robust scene representation by fusing domain-specific caption embeddings with general text features through dual-branch fusion with gating and cross-attention. This resulting representation then facilitates the alignment with visual features. By utilizing enhanced scene encoding only during training, our method internalizes rich contextual knowledge without increasing inference complexity. Experiments on multiple benchmarks demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of our approach for OVD in RS.
Multimodal RAG Enhanced Visual Description
Textual descriptions for multimodal inputs entail recurrent refinement of queries to produce relevant output images. Despite efforts to address challenges such as scaling model size and data volume, the cost associated with pre-training and fine-tuning remains substantial. However, pre-trained large multimodal models (LMMs) encounter a modality gap, characterised by a misalignment between textual and visual representations within a common embedding space. Although fine-tuning can potentially mitigate this gap, it is typically expensive and impractical due to the requirement for extensive domain-driven data. To overcome this challenge, we propose a lightweight training-free approach utilising Retrieval-Augmented Generation (RAG) to extend across the modality using a linear mapping, which can be computed efficiently. Our reproducible code can be found in https://github.com/amitkumarj441/mRAG-gim. During inference, this mapping is applied to images embedded by an LMM enabling retrieval of closest textual descriptions from the training set. These textual descriptions, in conjunction with an instruction, cater as an input prompt for the language model to generate new textual descriptions. In addition, we introduce an iterative technique for distilling the mapping by generating synthetic descriptions via the language model facilitating optimisation for standard utilised image description measures. Experimental results on two benchmark multimodal datasets demonstrate significant improvements.
An Efficient PIM-Based Graph Engine on a Single Machine
With the increasing size of real-world networks, efficient analysis of large-scale graphs has become an important research area. To this end, we can consider Processing-in-Memory (PIM), which integrates processing units and main memory into a single chip, as a promising solution. Many studies have focused on enabling highly efficient processing of memory-intensive tasks by using PIM's high internal bandwidth. To the best of our knowledge, however, there have been no studies related to the scenarios where the entire graph does not fit in main memory and data movement across storage, memory, and cache should be considered. Motivated by this, we propose RealGraph PIM , a new PIM-based graph engine, that processes large-scale real-world graphs efficiently on top of the original RealGraph, a state-of-the-art CPU-based graph engine. RealGraph PIM employs (1) asynchronous I/O to reduce wasting time in an idle state and (2) column-wise partitioning to reduce CPU workloads, thereby issuing I/O requests more frequently. Experimental results on real-world datasets show that RealGraph PIM outperforms dramatically state-of-the-art graph engines including a naive version of RealGraphPIM.
Leveraging Large Language Models for Complementary Product Ads Recommendation
Recommending complementary products1 that fulfill a joint need (e.g., phone case for smartphone) are often overlooked by dynamic product advertising (DPA) systems despite their success on e-Commerce websites such as Amazon. Existing works on complementary product recommendation focus on mining frequently co-purchased products but suffer from low accuracy as co-purchased products are not always complements to each other. More recent works rely on human annotators to clean co-purchased product pairs and use to train end-to-end models for complementary product recommendation. However, unlike e-commerce websites, DPA systems usually do not have access to users' complete shopping history, making the identification of co-purchased products challenging. Moreover, depending on the product types, identifying the complements of a given product may require extensive domain knowledge that is not present in a pair of complementary products. In this work, we propose a novel generate-and-retrieval paradigm to make complementary product recommendations and explore the use of LLMs for this task. Specifically, we rely on LLMs to generate queries that describe the complements of an original product. The generated queries are then used to retrieve relevant products from a product index. The retrieved products are expected to be complementary to the original product. We design experiments using the public Amazon ESCI datasets and compare in-context learning with parameter efficient fine-tuning using models from the GPT and Gemini families for complementary product generation. Our evaluation shows that by leveraging the extensive knowledge of LLMs on product relationship, using only a small number of human-annotated examples, pre-trained LLMs with proper prompt outperform LLMs fine-tuned with tens of thousands human-annotated examples.
Watch Your Step: A Fine-Grained Evaluation Framework for Multi-hop Knowledge Editing in Large Language Models
Knowledge editing allows for targeted updates of specific factual information in Large Language Models (LLMs). While existing methods can effectively update localized facts, they often struggle to coherently integrate these updates into the model's broader knowledge structure. Multi-hop knowledge editing addresses this issue by aiming that edited information is consistently reflected throughout the multi-hop reasoning process. However, current evaluation methods primarily assess the correctness of the final answer, which cannot guarantee that the edited knowledge has been correctly integrated into the reasoning process. To address these limitations, we propose a novel evaluation framework to systematically examine how edited knowledge is integrated within a multi-hop reasoning process. We introduce three types of entity-level errors: (i) Entity Persistence, where outdated entities remain; (ii) Entity Mismatch, where unrelated entities appear; and (iii) Entity Distortion, where entities are morphologically distorted, such as misspellings or truncations. Our analysis reveals that these errors frequently occur even when the final answer is correct. Moreover, when the final answer is incorrect, Entity Mismatch Errors are commonly observed, indicating unintended side effects of knowledge editing. The code is available at https://github.com/KUNLP/multihop-edit-eval.
M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
Achieving accurate traffic prediction is a fundamental but crucial task in the development of current intelligent transportation systems. These limitations pose significant challenges for the efficient deployment and operation of deep learning models on large-scale datasets. To address these challenges, we propose a cost-effective graph-free Multilayer Perceptron (MLP) based model M3-Net for traffic prediction. Extensive experiments conducted on multiple real datasets demonstrate the superiority of the proposed model in terms of prediction performance and lightweight deployment. Our code is available at https://github.com/jinguangyin/M3_NET
Imputing Multi-Agent Trajectories from Event and Snapshot Data in Soccer
Recent advances in wearable sensors and computer vision technologies have enabled the collection of tracking data in team sports, which has become a core resource for fine-grained analysis. However, the availability of tracking data remains constrained by high acquisition costs and technical limitations. Compared to tracking data, event data recording on-ball actions and snapshot data providing partial player positions are more widely accessible. To this end, this study proposes a novel approach to predict the positions of all players at each event timestamp in soccer matches, leveraging the limited information available from event and snapshot data. We propose an event-based imputation model that integrates spatial and temporal attention to capture the spatiotemporal multi-agent structure. In experiments, we evaluate our model on 13 soccer matches, achieving average position errors of 5.84 m. To assess the practical utility of our approach, we apply it to a downstream task called Pitch Control, which requires full tracking data. These results highlight the potential of event-based position imputation to expand access to fine-grained analysis in data-constrained settings.
CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors
The widespread use of Large Language Models (LLMs) in many applications marks a significant advance in research and practice. However, their complexity and hard-to-understand nature make them vulnerable to attacks, especially jailbreaks designed to produce harmful responses. To counter these threats, developing strong detection methods is essential for the safe and reliable use of LLMs. This paper studies this detection problem using the Contextual Co-occurrence Matrix, a structure recognized for its efficacy in data-scarce environments. We propose a novel method leveraging the latent space characteristics of Contextual Co-occurrence Matrices and Tensors for the effective identification of adversarial and jailbreak prompts. Our evaluations show that this approach achieves a notable F1 score of 0.83 using only 0.5% of labeled prompts, which is a 96.6% improvement over baselines. This result highlights the strength of our learned patterns, especially when labeled data is scarce. Our method is also significantly faster, speedup ranging from 2.3 to 128.4 times compared to the baseline models.
Sarcasm Subtype-Specific Reasoning in Dialogue with Multimodal Cues Using Large Language Models
Sarcasm is a nuanced form of human communication characterized by a mismatch between an utterance and the speaker's intent or contextual cues. Recent studies have aimed to advance the understanding of sarcasm by developing systems capable of generating rationales behind sarcastic expressions. In particular, multimodal cues such as facial expressions and vocal tone have been crucial indicators when semantic incongruity with the utterance is prominent. However, existing multimodal sarcasm reasoning approaches fall short of providing fine-grained explanations. Sarcasm can be further categorized into specific subtypes based on the forms of inversion it employs, such as nonverbal cues, dialogue context, and exaggerated word emphasis. To address this, we introduce a novel task called Sarcasm Subtype-specific Reasoning Generation (SSRG). To facilitate research on this task, we present the Sarcasm Subtype-specific Reasoning Dataset (SSRD), which establishes a new benchmark for fine-grained sarcasm reasoning. Through extensive experiments, we demonstrate that leveraging multimodal cues significantly enhances subtype-specific sarcasm reasoning. Moreover, we show that integrating these multimodal cues into textual representations enables strong performance even when using only large language models (LLMs).
When Language Shapes Thought: Cross-Lingual Transfer of Factual Knowledge in Question Answering
Multilingual large language models (LLMs) offer promising opportunities for cross-lingual information access, yet their use of factual knowledge remains highly sensitive to the input language. Prior work has addressed this through English prompting and evaluation, assuming that English-based reasoning is universally beneficial. In this work, we challenge that assumption by exploring factual knowledge transfer from non-English to English through the lens of Language and Thought Theory. We introduce Language-to-Thought (L2T) prompting, which aligns the model's internal ''thinking'' language with the source of knowledge. Across three languages and four models, L2T consistently outperforms English-based reasoning, reversing the expected advantage of English prompts. Our code is available at https://github.com/GeomeunByeol/Language2Thought.
Jailbreaking LLMs Through Cross-Cultural Prompts
We examine how linguistic and cultural framing affect jailbreak success in three commercial LLMs (GPT-4, Claude 3, Gemini), using semantically equivalent prompts in direct, indirect, and metaphorical styles across four high-resource languages. Indirect prompts most effectively bypassed filters, with framing and style significantly influencing alignment. GPT-4 was especially vulnerable to indirect framing, Claude 3 remained consistently robust, and Gemini showed high sensitivity to cultural and linguistic variation. Our findings highlight the need for alignment strategies resilient to diverse expression styles and cultural contexts.
When User Engagement Meets Structural Cohesiveness: A Decay-Driven Approach to Hypergraph Cores
Cohesive subgraph discovery in hypergraphs is essential for analysing complex group interactions in various domains such as e-commerce, social media, and collaboration networks. However, existing models are vulnerable to large hyperedges that artificially inflate connectivity, obscuring meaningful structure. We propose the (k,s)-core, a new model requiring each node to have at least k neighbours with a minimum interaction strength s, measured via a size-sensitive decay function. This penalises noisy co-occurrences while preserving strong local patterns. We develop an efficient algorithm with theoretical guarantees, and experiments on real-world datasets demonstrate improved compactness and robustness over prior methods.
CR-SGCN: Unsupervised Signed Community Detection via Conductance Regularization
Community detection in signed networks is challenging due to the presence of both positive and negative edges, which violate the homophily assumption commonly used in traditional methods. In this paper, we present CR-SGCN, an unsupervised framework for community detection in signed networks. It combines a signed GCN encoder, a soft community assignment layer, and a degree-corrected stochastic block model decoder. To enhance boundary separation, we introduce an edge-level signed conductance regularization that pulls intra-community embeddings closer and pushes inter-community ones apart. Without requiring labels, CR-SGCN effectively captures community structure even under edge sparsity. Experiments on real-world signed networks show consistent gains in signed modularity and structural separation over existing baselines. The results demonstrate the robustness and effectiveness of CR-SGCN for unsupervised signed community detection.
Upcycling Candidate Tokens of Large Language Models for Query Expansion
Query Expansion (QE) improves retrieval performance by enriching queries with related terms. Recently, Large Language Models (LLMs) have been used for QE, but existing methods face a trade-off: generating diverse terms boosts performance but increases computational cost. To address this challenge, we propose Candidate Token Query Expansion (CTQE), which extracts diverse and relevant terms from a single LLM decoding pass by leveraging unselected candidate tokens. These tokens, though not part of the final output, are conditioned on the full query and capture useful information. By aggregating them, CTQE achieves both relevance and diversity without extra inference, reducing overhead and latency. Experiments show that CTQE delivers strong retrieval performance with significantly lower cost, outperforming or comparable to more expensive methods. Code is available at: https://github.com/bluejeans8/CTQE
RadialFocus: Geometric Graph Transformers via Distance-Modulated Attention
Graph Transformers (GTs) excel at long-range reasoning on graphs but often rely on costly positional encodings or auxiliary virtual nodes to perceive geometry. We present the RadialFocus Graph Transformer (RadialFocus), a geometry-aware GT that learns to modulate attention with a lightweight, distance-selective kernel. Each head is equipped with a differentiable radial basis function whose centre μ and width σ are trained end-to-end, boosting attention between nodes that lie inside its adaptive ''focus'' while gently suppressing others. Injecting the logarithm of this kernel into the pre-softmax logits preserves the stability and permutation invariance of standard self-attention, incurs negligible memory overhead, and removes the need for hand-crafted 3-D encodings or virtual nodes. On 3-D molecular benchmarks RadialFocus attains a validation MAE of 46.3, meV on PCQM4Mv2 with only 13 M parameters, surpassing models an order of magnitude larger. It also sets a new best average ROC-AUC (79.1 %) on MoleculeNet and reaches 0.957 MAE on PDBBind2020, a new high-water mark for binding-affinity prediction. The same architecture transfers to 2-D graphs, achieving 97.8 % accuracy on MNIST-Superpixel. Ablation studies indicate that the learned (μ, σ) capture task-relevant distance scales and that log-space fusion stabilises gradients. These findings suggest that a simple, learned distance modulation suffices to equip Transformers with strong geometric priors, enabling accurate and parameter-efficient reasoning across diverse graph domains.
Learning Short-Term and Long-Term Patterns of High-Order Dynamics in Real-World Networks
Real-world networks have high-order relationships among objects and they evolve over time. To capture such dynamics, many works have been studied in a range of fields. Via an in-depth preliminary analysis, we observe two important characteristics of high-order dynamics in real-world networks: high-order relations tend to (O1) have a structural and temporal influence on other relations in a short term and (O2) periodically re-appear in a long term. In this paper, we propose LINCOLN, a method for Learning hIgh-order dyNamiCs Of reaL-world Networks, that employs (1) bi-interactional hyperedge encoding for short-term patterns, (2) periodic time injection and (3) intermediate node representation for long-term patterns. Via extensive experiments, we show that LINCOLN outperforms nine state-of-the-art methods in the dynamic hyperedge prediction task.
Side Information Memory Network: Expanding the Breadth of User Behavior Sequences in Recommendation
Research on sequence-based ranking models has been a popular field in recommendation systems. In recent years, numerous researchers have devoted themselves to expanding the content of user behavior sequences, such as longer sequences, more types of sequences, and more variable sequence periods. These studies have achieved promising results, especially in the direction of longer sequences, where a large number of industrial recommendation systems have demonstrated that longer sequences can lead to better performance. However, as the sequence length approaches the upper limit of user behavior occurrences, the marginal benefit of increasing sequence length is gradually diminishing. Against this backdrop, this paper proposes a sequence expansion framework based on Side Information Memory Network (SIMN). Based on SIMN, theoretically all item-side features can be incorporated into the sequence, while avoiding additional sample development costs and storage costs. Furthermore, considering the application of this framework in small to medium-sized recommendation systems, this paper proposes a Feature Auto Encoder-Decoder (FAED) module, which further reduces the storage cost of SIMN. This paper integrates SIMN and FAED into a unified multitask training framework for modeling and validates it on two industrial datasets. Experimental results demonstrate that SIMN-FAED can be integrated with most mainstream sequence modeling methods and achieve better performance, with broad application prospects.
On Evaluating Loss Functions for Stock Ranking: An Empirical Analysis with Transformer Model
Quantitative trading strategies rely on accurately ranking stocks to identify profitable investments. Effective portfolio management requires models that can reliably order future stock returns. Transformer models are promising for understanding financial time series, but how different training loss functions affect their ability to rank stocks well is not yet fully understood. Financial markets are challenging due to their changing nature and complex relationships between stocks. Standard loss functions, which aim for simple prediction accuracy, often aren't enough. They don't directly teach models to learn the correct order of stock returns. While many advanced ranking losses exist from fields such as information retrieval, there hasn't been a thorough comparison to see how well they work for ranking financial returns, especially when used with modern Transformer models for stock selection. This paper addresses this gap by systematically evaluating a diverse set of advanced loss functions including pointwise, pairwise, listwise for daily stock return forecasting to facilitate rank-based portfolio selection on S&P 500 data. We focus on assessing how each loss function influences the model's ability to discern profitable relative orderings among assets. Our research contributes a comprehensive benchmark revealing how different loss functions impact a model's ability to learn cross-sectional and temporal patterns crucial for portfolio selection, thereby offering practical guidance for optimizing ranking-based trading strategies.
Mitigating Knowledge Degradation Caused by Knowledge Editing on Identical Subjects through Two-Step Editing
Large Language Models (LLMs) acquire extensive factual knowledge from large-scale datasets and demonstrate remarkable performance across various tasks. However, since real-world knowledge is constantly changing, it is necessary to modify or expand the model's knowledge. To achieve this, knowledge editing techniques are employed to correct inaccurate or outdated information and inject new knowledge, thereby ensuring that the model remains current. However, in existing subject-centered editing approaches, repeatedly editing the same subject can lead to knowledge degradation, where previously edited knowledge is forgotten. In this paper, we analyze the causes of this knowledge degradation phenomenon and propose a two-step editing method that independently edits subjects and relations to mitigate this issue. Our method effectively alleviates knowledge degradation compared to existing knowledge editing techniques, achieving average performance improvements of 22.9% in multi-edit scenarios and 7.2% in sequential editing.
Spectral Edge Encoding - SEE: Does Structural Information Really Enhance Graph Transformer Performance?
We propose Spectral Edge Encoding (SEE), a parameter-free framework that quantifies each edge's contribution to the global structure by measuring spectral shifts in the Laplacian eigenvalues. SEE captures the low-frequency sensitivity of edges and integrates these scores into graph Transformer attention logits as a structure-aware bias. When applied to the Moiré Graph Transformer (MoiréGT) and evaluated on seven MoleculeNet classification benchmarks, SEE consistently improves ROC-AUC performance. In particular, MoiréGT+SEE achieves an average ROC-AUC of 85.3%, approximately 7.1 percentage points higher than the previous state-of-the-art model UniCorn (78.2%). Moreover, SEE preserves molecular topology and enables edge-level interpretability, offering a practical alternative to sequence-based chemical language models. These results demonstrate that spectrum-informed attention can simultaneously enhance performance and transparency in graph-based molecular modeling.
Deceptive Synthetic Updates: Stealth Free-Rider Attack on Model Aggregation in Federated Learning
Federated Learning (FL) allows multiple clients to collaboratively train shared models without exchanging raw data, thereby preserving privacy. However, FL systems are vulnerable to malicious participants known as free-riders who exploit the collaborative nature without providing genuine data contributions. To expose this critical security threat, we introduce a novel stealth free-rider attack that leverages pre-trained forecasting models to generate highly realistic synthetic time-series data. Our approach enables malicious clients to deceive FL systems while obtaining benefits from fair participants' contributions, thereby undermining the integrity of federated networks. Numerical results on EEG-based sleep stage classification demonstrate that our attack maintains comparable performance with free-rider ratios up to 70% while causing catastrophic degradation when all clients are free-riders.
EI-KGC: A Knowledge Graph Completion Model Based on Fine-Grained Element Interactions
Most existing knowledge graph completion methods fail to model the fine-grained interactions among elements within triples, such as dependencies between entity attributes or contextual relationships involving predicates and entities. This limitation weakens their ability to infer implicit knowledge and hinders overall reasoning performance. To address this issue, we define a three-level classification of element interactions: Interactions between Elements at the Head Entity (IEH), Interactions between Elements at the Relationship (IER), and Interactions between Elements at the Tail Entity (IET), that systematically models the influence propagation patterns among knowledge graph triples at element level. Based on these interaction types, we propose a novel Knowledge Graph Completion Model Based on Fine-Grained Element Interactions (EI-KGC). Our model captures both global structural patterns and semantic dependencies within triples by combining GNN propagation with fine-grained interaction modeling. Experimental results show that the EI-KGC consistently outperforms traditional baseline models, demonstrating the high effectiveness of our proposed model.
Improving Content Anomaly Detection on Social Media via Counterfactual Mitigation of Social Event-Induced Bias
Content anomaly detection on social media (SNS) plays a critical role in maintaining healthy online communities. However, the prevalence of major real-world social events can introduce model bias that skews, narrows, and undermines a detector's ability to perceive anomaly in content by contaminating and homogenizing the expressions posted during these events. To address this challenge, we propose SeiNS, a novel, model-agnostic plugin designed to mitigate the bias induced by prevalent social events. SeiNS comprises two components: a Social Event Extractor and an Event Involvement Perceptor. For a given post, SeiNS removes the bias through steps that include, treating the correlation between the label and the social event-induced homogenized content, as well as the correlation between the label and the event-involvement representation, as spurious; performing counterfactual learning on them during finetuning. We evaluate SeiNS on two benchmark datasets composed of real SNS posts under various major social events, where content anomaly respectively plays a negative and a positive role for social good. The results show that SeiNS significantly improves the robustness of SNS content anomaly detectors in dynamic-even unstable-online social environments.
Dynamic Reserve Price Design with Distributed Solving Algorithm
Unexpected advertising items in sponsored search may reduce users' reliance on organic search, resulting in hidden cost for the e-commerce platform. To address this problem and promote sustainable growth, we propose a dynamic reserve price design that incorporates the hidden cost into the auction mechanism to determine whether to sell the traffic, thereby ensuring a balanced relationship between revenue and user experience. Our dynamic reserve price design framework optimizes traffic sales by minimizing impacts on user experience while maintaining long-term incentives for advertisers to reveal their valuations truthfully. Furthermore, we introduce a distributed algorithm capable of computing reserve prices with billion-scale data in the production environment. Experiments involving offline evaluations and online A/B testing demonstrate that this method is simple and efficient, making it suitable for use in industrial production. This method has already been fully deployed in the production environment.
Improving Rare and Common ICD Coding via a Multi-Agent LLM-Based Approach
Large Language Models (LLMs) have shown strong performance in tasks such as zero- and few-shot information extraction from clinical text without domain-specific training. However, in the ICD coding task, LLMs often hallucinate key details and produce high-recall but low-precision outputs due to the high-dimensional and imbalanced nature of ICD code distributions. Existing LLM-based approaches typically fail to capture the complex, dynamic interactions among human agents involved in real-world coding workflows-such as patients, physicians, and coders-and often lack interpretability and reliability. To address these challenges, we propose a novel multi-agent framework for ICD coding that simulates the real-world process using five role-specific LLM agents-patient, physician, coder, reviewer, and adjuster-and integrates the Subjective, Objective, Assessment, and Plan (SOAP) structure from Electronic Health Records to enhance performance. Evaluated on the MIMIC-III dataset, our method significantly outperforms zero-shot Chain-of-Thought prompting, self-consistency strategies, and LLM-designed agent baselines, particularly for rare codes. Ablation studies confirm the contribution of each agent role, and the system achieves competitive performance with state-of-the-art fine-tuned models, while offering better explainability and requiring no task-specific pre-training.
Position-Agnostic Probabilistic Generation for Robust Steganographic Text
With the rise of linguistic steganography, the robustness to withstand minor perturbations like textual edits or tokenization shifts remains a critical yet crucial challenge. To address this, we propose a novel robustness-enhancing method, Position-Agnostic Probabilistic Generation, which combines semantic clustering with probabilistic generation control. During decoding, the model is softly guided to prefer or avoid specific token sets through time-aware probability boosting, enabling robust bit embedding without relying on fixed token positions. These token sets are constructed from semantically coherent clusters derived from the language model's vocabulary and expanded to ensure fluency. A dynamic, time-aware boosting strategy is then applied to gradually amplify the likelihood of valid tokens throughout the generation. Experimental results demonstrate that our method consistently outperforms baselines in preserving hidden information under 10 types of perturbations.
Token-Fusion: A Sparse Expert Routing Method for Multi-task Data Matching
Multi-task data matching-including entity matching, entity linking, and schema matching-is a fundamental task in data integration, yet remains challenging due to heterogeneous inputs and task-specific model designs. We propose Token-Fusion, a unified sparse expert method that integrates token-level dynamic expert routing, adaptive expert pool management, and a fusion strategy guided by both confidence and performance gain. Meanwhile, regularization losses are designed to encourage sparse and diverse expert activation for improved efficiency. Our extensive experimental evaluation on six public datasets demonstrates the effectiveness and efficiency of Token-Fusion in handling heterogeneous matching tasks, establishing it as a promising solution for unified and scalable multi-task data matching.
Few-Shot Knowledge Graph Completion via Transfer Knowledge from Similar Tasks
Knowledge graphs (KGs) are essential in many AI applications but often suffer from incompleteness, limiting their utility. Many relations in KGs have only a few examples, making it challenging to train accurate models. Few-shot learning offers a promising direction by enabling KG completion with only a small number of training triplets. However, most existing approaches treat each relation independently and fail to leverage shared information across tasks. In this paper, we introduce TransNet, a transfer learning method for few-shot KG completion that captures task relationships and reuses knowledge from related tasks. TransNet further incorporates meta-learning to effectively handle unseen relations. Experiments on standard benchmarks demonstrate that TransNet achieves strong performance compared to prior methods. Code and data will be released upon acceptance.
Monte Carlo Tree Search for Graph Reasoning in Large Language Model Agents
While large language models (LLMs) have achieved impressive results across many tasks, they remain prone to hallucinations, particularly in domains requiring substantial background knowledge. A common way to mitigate this issue is to incorporate external knowledge, often through retrieval-augmented generation (RAG). However, most existing RAG approaches focus solely on textual data and neglect an important aspect: the connections between pieces of knowledge. In domains such as scientific publishing, entities like papers, authors, and citations form rich graphs, where meaning emerges not only from individual texts but also from their relationships. To address this, we propose Graph-MCTS, a framework that enhances LLM reasoning by leveraging graph structures. Graph-MCTS uses Monte Carlo Tree Search (MCTS) to guide the model through structured exploration of graph-based knowledge. We evaluate Graph-MCTS across multiple LLM architectures and find that it consistently outperforms existing augmentation methods. These findings highlight the importance of structured, relational knowledge for improving the reasoning capabilities of LLMs. Code is available at https://github.com/lihuiliullh/Graph-MCTS
Powering Job Search at Scale: LLM-Enhanced Query Understanding in Job Matching Systems
Query understanding is essential in modern relevance systems, where user queries are often short, ambiguous, and highly context-dependent. Traditional approaches often rely on multiple task-specific Named Entity Recognition models to extract structured facets as seen in job search applications. However, this fragmented architecture is brittle, expensive to maintain, and slow to adapt to evolving taxonomies and language patterns. In this paper, we introduce a unified query understanding framework powered by a Large Language Model (LLM), designed to address these limitations. Our approach jointly models the user query and contextual signals such as profile attributes to generate structured interpretations that drive more accurate and personalized recommendations. The framework improves relevance quality in online A/B testing while significantly reducing system complexity and operational overhead. The results demonstrate that our solution provides a scalable and adaptable foundation for query understanding in dynamic web applications.
Sequential Difference Maximization: Generating Adversarial Examples via Multi-Stage Optimization
Efficient adversarial attack methods are critical for assessing the robustness of computer vision models. In this paper, we reconstruct the optimization objective for generating adversarial examples as ''maximizing the difference between the non-true labels' probability upper bound and the true label's probability,'' and propose a gradient-based attack method termed Sequential Difference Maximization (SDM). SDM establishes a three-layer optimization framework of ''cycle-stage-step.'' The processes between cycles and between iterative steps are respectively identical, while optimization stages differ in terms of loss functions: in the initial stage, the negative probability of the true label is used as the loss function to compress the solution space; in subsequent stages, we introduce the Directional Probability Difference Ratio (DPDR) loss function to gradually increase the non-true labels' probability upper bound by compressing the irrelevant labels' probabilities. Experiments demonstrate that compared with previous SOTA methods, SDM not only exhibits stronger attack performance but also achieves higher attack cost-effectiveness. Additionally, SDM can be combined with adversarial training methods to enhance their defensive effects. The code is available at https://github.com/X-L-Liu/SDM.
Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot Dense Retrieval
Transformer-based models such as BERT and E5 have significantly advanced text embedding by capturing rich contextual representations. However, many complex real-world queries require sophisticated reasoning to retrieve relevant documents beyond surface-level lexical matching, where encoder-only retrievers often fall short. Decoder-only large language models (LLMs), known for their strong reasoning capabilities, offer a promising alternative. Despite this potential, existing LLM-based embedding methods primarily focus on contextual representation and do not fully exploit the reasoning strength of LLMs. To bridge this gap, we propose Reasoning-Infused Text Embedding (RITE), a simple but effective approach that integrates logical reasoning into the text embedding process using generative LLMs. RITE builds upon existing language model embedding techniques by generating intermediate reasoning texts in the token space before computing embeddings, thereby enriching representations with inferential depth. Experimental results on BRIGHT, a reasoning-intensive retrieval benchmark, demonstrate that RITE significantly enhances zero-shot retrieval performance across diverse domains, underscoring the effectiveness of incorporating reasoning into the embedding process.
CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins
The rapid growth of memecoins within the Web3 ecosystem, driven by platforms like Pump.fun, has made it easier for anyone to create tokens. However, this democratization has also led to an explosion of low-quality or bot-generated projects, often motivated by short-term financial gain. This overwhelming influx of speculative tokens creates a challenge in distinguishing viable memecoins from those that are unlikely to succeed. To address this issue, we introduce CoinVibe, a comprehensive multimodal dataset designed to evaluate the viability of memecoins. CoinVibe integrates textual descriptions, visual content (logos), and community data (user comments, timestamps, and number of likes) to provide a holistic view of a memecoin's potential. In addition, we present CoinCLIP, a novel framework that leverages the Contrastive Language-Image Pre-Training (CLIP) model, augmented with lightweight modules and community data integration, to improve classification accuracy. By combining visual and textual representations with community insights, CoinCLIP provides a robust, data-driven approach to filter out low-quality or bot-driven projects. This research aims to help creators and investors identify high-potential memecoins, while also offering valuable insights into the factors that contribute to their long-term success.
KP-Agent: Keyword Pruning in Sponsored Search Advertising via LLM-Powered Contextual Bandits
Sponsored search advertising (SSA) requires advertisers to constantly adjust keyword strategies. While bid adjustment and keyword generation are well-studied, keyword pruning-refining keyword sets to enhance campaign performance-remains underexplored. This paper addresses critical inefficiencies in current practices as evidenced by a dataset containing 0.5 million SSA records from a pharmaceutical advertiser on search engine Meituan, China's largest delivery platform. We propose KP-Agent, an LLM agentic system with domain tool set and a memory module. By modeling keyword pruning within a contextual bandit framework, KP-Agent generates code snippets to refine keyword sets through reinforcement learning. Experiments show KP-Agent improves cumulative profit by up to 49.28% over baselines.
ORCA: Mitigating Over-Reliance for Multi-Task Dwell Time Prediction with Causal Decoupling
Dwell time (DT) is a critical post-click metric for evaluating user preference in recommender systems, complementing the traditional click-through rate (CTR). Although multi-task learning is widely adopted to jointly optimize DT and CTR, we observe that multi-task models systematically collapse their DT predictions to the shortest and longest bins, under-predicting the moderate durations. We attribute this moderate-duration bin under-representation to over-reliance on the CTR-DT spurious correlation, and propose ORCA to address it with causal-decoupling. Specifically, ORCA explicitly models and subtracts CTR's negative transfer while preserving its positive transfer. We further introduce (i) feature-level counterfactual intervention, and (ii) a task-interaction module with instance inverse-weighting, weakening CTR-mediated effect and restoring direct DT semantics. ORCA is model-agnostic and easy to deploy. Experiments show an average 10.6% lift in DT metrics without harming CTR. Code is available at https://github.com/Chrissie-Law/ORCA-Mitigating-Over-Reliance-for-Multi-Task-Dwell-Time-Prediction-with-Causal-Decoupling.
Anchor-based Pairwise Comparison via Large Language Model for Recommendation Reranking
In recommender systems, reranking is an important post-processing technique to reorder the items in a recommendation list. Recently, some LLM-based reranking approaches have been proposed to enjoy the semantic reasoning capability of a large language model. However, they are sensitive to the order of the input list and often incur large computational overheads. To address their limitations, we propose the APCR, an Anchor-based Pairwise Comparison for recommendation Reranking in this paper. It first leverages an LLM to conduct pairwise comparisons between those recommended items and an anchor and computes a kind of preference scores for producing a new LLM suggested list. We next propose a position-aware list reranking technique to reorder the items in the recommendation list by considering their positions in the LLM suggested list to output the final list. Experiments on real-world datasets show that our APCR outperforms the state-of-the-art LLM-based reranking techniques in terms of better list ranking performance.
Bridging the Gap between Knowledge Graphs and LLMs for Multi-hop Question Answering
To achieve multi-hop question answering over knowledge graphs (KGQA), many studies have explored converting retrieved subgraphs into textual form and feeding them into large language models (LLMs) to leverage their reasoning capabilities. However, due to the linear and discrete nature of text sequences, model performance may degrade when handling complex questions. To this end, we propose a novel structure-text knowledge synergistic method, BrikQA, which bridges the knowledge gap between knowledge graphs (KGs) and LLMs for multi-hop KGQA. LLMs and KGs complement each other by leveraging explicit topological patterns and implicit knowledge mining to enhance knowledge understanding and address sparsity issues. Experimental results on various datasets demonstrate that BrikQA outperforms state-of-the-art baselines. Our source code is available at https://github.com/shijielaw/BrikQA.
From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media
Personality prediction from social media posts is a critical task that implies diverse applications in psychology and sociology. The Myers-Briggs Type Indicator (MBTI), a popular personality inventory, has been traditionally predicted by machine learning (ML) and deep learning (DL) techniques. Recently, the success of Large Language Models (LLMs) has revealed their huge potential in understanding and inferring personality traits from social media content. However, directly exploiting LLMs for MBTI prediction faces two key challenges: the hallucination problem inherent in LLMs and the naturally imbalanced distribution of MBTI types in the population. In this paper, we propose PostToPersonality (P2P), a novel LLM- based framework for MBTI prediction from social media posts of individuals. Specifically, P2P leverages Retrieval-Augmented Generation with in-context learning to mitigate hallucination in LLMs. Furthermore, we fine-tune a pre-trained LLM to improve model specification in MBTI understanding with synthetic minority oversampling, which balances the class imbalance by generating synthetic samples. Experiments conducted on a real-world social media dataset demonstrate that P2P achieves state-of-the-art performance compared with 10 ML/DL baselines.
Image Hashing Based on Hamming Ball Spacing
Image hashing has been widely used in large-scale image retrieval due to its enhancement of storage space and retrieval speed. Recently, methods based on hash centers have achieved impressive retrieval performance, aiming to assign mutually separated hash codes as center points for each category and learn compact binary codes by minimizing intra-class variance. However, current methods tend to generate codebooks in advance, which are then used as pre-defined hash centers, leading to compromised quality of hash centers in instance-level datasets with excessive categories. To address this problem, we designed a training strategy that learns hash centers by constraining the margin of category clusters in Hamming space, during which the Gilbert-Varshamov bound and the upper bound of Hamming ball packing from coding theory are utilized to determine the range of margins. Finally, we jointly optimize hash encoder and hash centers to improve retrieval performance. Extensive experiments on more challenging large-scale instance-level datasets demonstrate that our method effectively overcomes the limitation of retrieval performance imposed by the number of categories. Code is at: https://github.com/GrimmAI/Hamming-Ball-Hashing.git.
kNNBE: Incorporating Labeled Sentences in Bi-encoder Inference for Fast and Accurate Skill Mapping
Skill mapping is a key task in the Human Resources domain. It consists in identifying ontology-defined skills in job texts. Among the most successful approaches applied to skill mapping, bi-encoders offer efficient inference but struggle with fine-grained skill distinctions, particularly under limited supervision. While accurate, cross-encoder and LLM-based reranking approaches are computationally expensive and usually not feasible to be adopted in real case scenarios. We propose kNNBE, a hybrid inference method that augments bi-encoder similarity scores with k-nearest labeled sentences drawn from a synthetic memory bank. kNNBE improves both prediction accuracy and generalization to unseen skills while retaining high throughput. Extensive experiments on three benchmark datasets show that kNNBE rivals state-of-the-art rerankers in accuracy while being orders of magnitude faster.
Leveraging Intra-Modal Consistency for Cross-Modal Alignment and Retrieval
Cross-modal retrieval aims to match videos and texts by mapping them into a shared feature space. Most existing approaches achieve alignment through contrastive learning based on one-to-one supervised pairs. However, these methods rely too much on supervised signals and do not fully use the unsupervised semantic relationships within each modality. As a result, samples that are semantically similar may be spread in the shared space, which hurts retrieval performance. To solve this problem, we propose a method called Leveraging Intra-Modal Consistency for Cross-Modal Alignment and Retrieval (LICA). Our method introduces a consistency constraint between intra-modal similarities and cross-modal similarity distributions. In this way, samples that are close in meaning stay closer together in the shared space. Experiments on standard text-video retrieval benchmarks show that LICA helps optimize the distribution of the cross-modal feature space and improves retrieval accuracy.
Multi-Task Learning through Hierarchical Information Sharing and Transfer
In this work, we propose a novel Hierarchical Information Sharing and Transfer (HIST) framework for multi-task learning, which employs implicit shared-bottom pattern and explicit sequential transfer at tower-level simultaneously. In particular, a multi-level gating mixture-of-experts is presented for efficient bottom-level sharing. Further, self-attention mechanism is adopted for information transfer between task-specific towers. Such hierarchical task interaction scheme leads to a remarkable enhancement in multi-task learning settings. Extensive experiments on four subsets of AliExpress dataset unequivocally demonstrate that HIST outperforms the current state-of-the-art methods consistently.
LLM-Enhanced Linear Autoencoders for Recommendation
Large language models (LLMs) have been widely adopted to enrich the semantic representation of textual item information in recommender systems. However, existing linear autoencoders (LAEs) that incorporate textual information rely on sparse word co-occurrence patterns, limiting their ability to capture rich textual semantics. To address this, we propose L3AE, the first integration of LLMs into the LAE framework. L3AE effectively integrates the heterogeneous knowledge of textual semantics and user-item interactions through a two-phase optimization strategy. (i) L3AE first constructs a semantic item-to-item correlation matrix from LLM-derived item representations. (ii) It then learns an item-to-item weight matrix from collaborative signals while distilling semantic item correlations as regularization. Notably, each phase of L3AE is optimized through closed-form solutions, ensuring global optimality and computational efficiency. Extensive experiments demonstrate that L3AE consistently outperforms state-of-the-art LLM-enhanced models on three benchmark datasets, achieving gains of 27.6% in Recall@20 and 39.3% in NDCG@20. The source code is available at https://github.com/jaewan7599/L3AE_CIKM2025.
Bayesian Privacy Guarantee for User History in Sequential Recommendation Using Randomised Response
Sequential recommendation systems play an important role in delivering personalised user experiences, yet they rely heavily on detailed user history, raising serious privacy concerns. In this work, we introduce a novel framework that integrates a randomised response mechanism into sequential recommendation to provide strong privacy guarantees while preserving recommendation effectiveness. By obfuscating user history through controlled probabilistic item substitution based on semantic similarity, our approach ensures that released sequences protect individual behaviour with provable Bayesian posterior privacy. We further propose training strategies tailored for privacy-filtered data, including a frequency-based vocabulary expansion method inspired by subword tokenisation. Experiments on four real-world datasets demonstrate that our approach preserves recommendation quality under strong privacy constraints and outperforms existing baselines even without applying privacy filters.
GraFS: An Integrated GNN-LLM Approach for Inferring Best Functional Substitute Products
Identifying and ranking the most functionally substitutable products is a key challenge for improving product selection and recommendations in e-commerce. Functionally substitutable products are items that share core functional attributes, which allow them to serve similar purposes despite variations in product features. Traditional methods struggle with (1) accurately capturing functional similarities between products with unique attributes, and (2) ranking substitutes that best optimize product selection and meet customer needs. We introduce GraFS (Graph-enabled Large Language Model framework for Functional Substitute Selection), which combines Large Language Models (LLMs) to extract textual, semantic similarities from product descriptions and Graph Neural Networks (GNNs) that learn substitution patterns from customer behavior. Specifically, LLMs generate functional similarity embeddings from unstructured, product text attributes, while GNNs aggregate these embeddings with graph-structured customer data to predict substitution scores and rank products within functional groups. This dual approach enables GraFS to identify the top-k most suitable functional substitutes that maximize purchase likelihood while maintaining product diversity. Experiments on four large-scale e-commerce reviews datasets demonstrate that our framework significantly improves upon conventional methods and better captures relationships among functional substitute products by up to %19 NDCG@10 and Pre@10 against baselines.
Multi-modal Adaptive Mixture of Experts for Cold-start Recommendation
Recommendation systems have faced significant challenges in cold-start scenarios, where new items with a limited history of interaction need to be effectively recommended to users. Though multimodal data (e.g., images, text, audio, etc.) offer rich information to address this issue, existing approaches often employ simplistic integration methods such as concatenation, average pooling, or fixed weighting schemes, which fail to capture the complex relationships between modalities. Our study proposes a novel Mixture of Experts framework for multimodal cold-start recommendation (MAMEX), which dynamically leverages latent representation from different modalities. MAMEX utilizes modality-specific expert networks and introduces a learnable gating mechanism that adaptively weights the contribution of each modality based on its content characteristics. This approach enables MAMEX to emphasize the most informative modalities for each item while maintaining robustness when certain modalities are less relevant or missing. Extensive experiments on benchmark datasets show that MAMEX outperforms state-of-the-art models with superior accuracy and adaptability.
Solar Forecasting with Causality: A Graph-Transformer Approach to Spatiotemporal Dependencies
Accurate solar forecasting underpins effective renewable energy management. We present SolarCAST, a causally informed model predicting future global horizontal irradiance (GHI) at a target site using only historical GHI from site X and nearby stations S---unlike prior work that relies on sky-camera or satellite imagery requiring specialized hardware and heavy preprocessing. To deliver high accuracy with only public sensor data, SolarCAST models three classes of confounding factors behind X-S correlations using scalable neural components: (i) observable synchronous variables (e.g., time of day, station identity), handled via an embedding module; (ii) latent synchronous factors (e.g., regional weather patterns), captured by a spatio-temporal graph neural network; and (iii) time-lagged influences (e.g., cloud movement across stations), modeled with a gated transformer that learns temporal shifts. It outperforms leading time-series and multimodal baselines across diverse geographical conditions, and achieves a 25.9% error reduction over the top commercial forecaster, Solcast. SolarCAST offers a lightweight, practical, and generalizable solution for localized solar forecasting.. Code available at https://github.com/YananNiu/SolarCAST
Externalizing Social-Cognitive Structures for User Modeling: Toward Theory-Driven Profiling with LLMs
In this paper, we propose TRIPLE (TPB-dRIven Profiling with LLM rEfinement), a dynamic profiling framework that incorporates the Theory of Planned Behavior (TPB) into user profile modeling. Our method (1) extracts TPB components from historical text data to construct an initial user profile, (2) iteratively refines this profile by analyzing discrepancies between predicted and actual behaviors, and (3) continuously updates the user's state by incorporating newly arriving text. We evaluate TRIPLE on the LaMP datasets, focusing on rating prediction and personalized tweet paraphrasing tasks, using multiple open-source large language models. Experimental results demonstrate that TRIPLE consistently outperforms existing profiling methods across all evaluation settings. Qualitative analysis confirms that TRIPLE captures the psychological and social mechanisms underlying users' product evaluation and description. These findings provide empirical evidence that theory- driven user profiling can significantly improve personalization performance in recommender systems and related applications. Our implementation and examples of generated profiles are available at https://yestaehyung.github.io/cikm25-triple/.
Modeling Irregular Astronomical Time Series with Neural Stochastic Delay Differential Equations
Astronomical time series from large-scale surveys like LSST are often irregularly sampled and incomplete, posing challenges for classification and anomaly detection. We introduce a new framework based on Neural Stochastic Delay Differential Equations (Neural SDDEs) that combines stochastic modeling with neural networks to capture delayed temporal dynamics and handle irregular observations. Our approach integrates a delay-aware neural architecture, a numerical solver for SDDEs, and mechanisms to robustly learn from noisy, sparse sequences. Experiments on irregularly sampled astronomical data demonstrate strong classification accuracy and effective detection of novel astrophysical events, even with partial labels. This work highlights Neural SDDEs as a principled and practical tool for time series analysis under observational constraints.
FairSplit: Mitigating Bias in Graph Neural Networks through Sensitivity-based Edge Partitioning
Fairness in machine learning has become increasingly crucial, particularly in graph-based models where biased representations can reinforce societal inequalities. Traditional fairness-aware learning methods on graphs focus on graph rewiring, debiasing node embeddings, adversarial learning, and additional fairness constraints. However, these approaches often struggle to balance fairness and task performance. We propose a novel edge partitioning strategy that creates two distinct subgraphs, maintaining a balance between bias and diversity. We categorize edges as homophilic or heterophilic depending on the sensitive attribute of the corresponding node pairs. An edge is s-homophilic if it joins two nodes with the same sensitivity value, otherwise s-heterophilic. The partition splits the input graph into two subgraphs, both containing all nodes, one with only s-homophilic edges and the other with s-heterophilic ones. Using a Graph Neural Network (GNN), we obtain independent node representations from both graphs, which are then aggregated into a unified node embedding. To enforce fairness, we jointly optimize a primary task loss and a fairness loss, ensuring predictive accuracy and bias mitigation. We evaluate our approach on three benchmark datasets and find that it achieves improved fairness metrics while maintaining accuracy comparable to that of existing state-of-the-art methods.
Eliminating Sentiment Bias in Recommender Systems by Counterfactual Inference
Sentiment bias is newly discovered in Recommender Systems (RSs). Critical users and niche items are disadvantaged by such unfair recommendations. To mitigate this bias, we propose a novel approach by counterfactual inference, which is implemented in two stages. Experiment results validate that our model achieves comparable performance in rating prediction, providing better recommendations and effectively mitigating sentiment bias. To the best of our knowledge, this is the first work to employ counterfactual inference on sentiment bias mitigation in RSs.
Beyond Masking: Landmark-based Representation Learning and Knowledge-Distillation for Audio-Visual Deepfake Detection
Audio-visual deepfake detection methods demonstrate strong performance on academic datasets but fail significantly when applied to real-world. To address the shortcomings of previous approaches, we utilize landmarks dynamic information. First, we propose Landmark-based Distillation (LBD), motivated by I-JEPA's representation learning approach. LBD utilizes KL-divergence to align facial landmark predictions from visual and audio encoders, enforcing focus on geometric facial features rather than spurious background information. Second, we introduce Multimodal Temporal Information Alignment (MTIA), which employs contrastive learning to enhance temporal consistency between audio and visual representations. We conduct experiments on academic datasets and web-based deepfakes collected from diverse social media platforms, serving as real-world examples. Our proposed landmark-guided distillation framework achieves computational efficiency while improving multimodal video deepfake detection performance across a diverse range of deepfakes compared to existing methods. The code is available at https://github.com/Ckck12/Beyond-Masking.
Federated Gradient Boosting for Financial Fraud Detection: An Empirical Study in the Banking Sector
The development of effective fraud detection systems (FDS) is hindered by strict data privacy regulations that prevent centralized data sharing. Federated learning (FL) has emerged as a promising alternative, enabling collaborative model training without exposing sensitive data. While FL has been explored in the healthcare domain, research on its application to financial fraud detection remains relatively limited. Specifically, FL research on real-world banking fraud types-with detailed customer, account, and transaction data-remains underexplored. We present the first empirical study of federated gradient boosting models for financial fraud detection in the banking sector, motivated by their superior performance over deep learning models on tabular fraud data. We evaluate and compare four representative federated gradient boosting models using both a private multi-fraud banking dataset from the Financial Security Institute (FSI) and a publicly available banking dataset, under various scenarios. Key findings include the consistent superiority of FedXGBBagging (a federated gradient boosting model), general vulnerability to data quantity skew, performance instability under bank join/dropout, and limitations in detecting localized banking fraud types such as ATM skimming. The findings from our empirical study highlight challenges and design considerations for deploying FL-based FDSs in the banking sector.
ASAP: Unsupervised Post-training with Label Distribution Shift Adaptive Learning Rate
In real-world applications, machine learning models face online label shift, where label distributions change over time. Effective adaptation requires careful learning rate selection: too low slows adaptation and too high causes instability. We propose ASAP (Adaptive Shift Aware Post-training), which dynamically adjusts the learning rate by computing the cosine distance between current and previous unlabeled outputs and mapping it within a bounded range. ASAP requires no labels, model ensembles, or past inputs, using only the previous softmax output for fast, lightweight adaptation. Experiments across multiple datasets and shift scenarios show ASAP consistently improves accuracy and efficiency, making it practical for unsupervised model adaptation.
SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation
Electrocardiogram (ECG) delineation, the segmentation of meaningful waveform features, is critical for clinical diagnosis. Despite recent advances using deep learning, progress has been limited by the scarcity of publicly available annotated datasets. Semi-supervised learning presents a promising solution by leveraging abundant unlabeled ECG data. In this study, we present SemiSegECG, the first systematic benchmark for semi-supervised semantic segmentation (SemiSeg) in ECG delineation. We curated and unified multiple public datasets, including previously underused sources, to support robust and diverse evaluation. We adopted five representative SemiSeg algorithms from computer vision, implemented them on two different architectures: the convolutional network and the transformer, and evaluated them in two different settings: in-domain and cross-domain. Additionally, we propose ECG-specific training configurations and augmentation strategies and introduce a standardized evaluation framework. Our results show that the transformer outperforms the convolutional network in semi-supervised ECG delineation. We anticipate that SemiSegECG will serve as a foundation for advancing semi-supervised ECG delineation methods and will facilitate further research in this domain. The code repository is available at https://github.com/bakqui/semi-seg-ecg.
A Pivot-Enhanced Question Answering Framework: Using Iterative Sub-Question Decomposition and Answer-to-Question Verification
Question and Answering(QA) in low-resource languages remains a significant challenge due to the scarcity of high-quality training data. To address this, we propose a robust framework for low-resource QA. Our framework enhances performance through the integration of pivot-based translation, sub-question decomposition, and semantic consistency verification. Our proposed approach utilizes pivoting by translating questions into a high-resource language, and then translating the answer back into the original language. To improve the handling of complex queries, we introduce sub-question decomposition, which breaks down the original question into simpler sub-units for independent QA. Also, we incorporate a reverse QA mechanism that generates a new question from the predicted answer and measures its semantic similarity to the original question, thereby validating answer consistency. Evaluated on the TyDi QA benchmark, the proposed framework achieves a 19.21 chrF score and 0.67 BERTScore, corresponding to at least a 12% improvement in metrics over direct generation baselines.
FnRGNN: Distribution-aware Fairness in Graph Neural Network
Graph Neural Networks (GNNs) excel at learning from structured data, yet fairness in regression tasks remains underexplored. Existing approaches mainly target classification and representation-level debiasing, which cannot fully address the continuous nature of node-level regression. We propose FnRGNN, a fairness-aware in-processing framework for GNN-based node regression that applies interventions at three levels: (i) structure-level edge reweighting, (ii) representation-level alignment via MMD, and (iii) prediction-level normalization through Sinkhorn-based distribution matching. This multi-level strategy ensures robust fairness under complex graph topologies. Experiments on four real-world datasets demonstrate that FnRGNN reduces group disparities without sacrificing performance. Code is available at https://github.com/sybeam27/FnRGNN.
TwinBandit Prompt Optimizer: Adaptive Prompt Optimization via Synergistic Dual MAB-Guided Feedback
A common deficiency in Automatic Prompt Engineering (APE) is the failure to strategically employ specific failure feedback in concert with the adaptive and coordinated selection of diverse generation strategies. To address this deficiency, we introduce TwinBandit Prompt Optimizer (TBPO), an APE framework that employs a synergistic dual Multi-Armed Bandit (MAB) mechanism for adaptive prompt generation, applicable to black-box Large Language Models (LLMs). The first MAB identifies the most challenging training instances, informing an LLM-driven feedback pipeline responsible for generating (1) parameterized changes to guide prompt evolution and (2) n-shot example configurations. A second MAB then adaptively selects and ranks the proposed changes based on the empirical performance of previously generated prompts. TBPO then combines these ranked modifications and n-shot configurations to generate child prompts with targeted enhancements that specifically address these challenging instances. Iteratively applying this process to the best-performing prompts, TBPO forms a closed-loop cycle, strategically generating and exploring a tree of enhanced prompts. Benchmarking shows that TBPO achieves stronger performance compared to state-of-the-art APE baselines, highlighted by a 3.87% higher exact match rate on the GPQA-Diamond dataset. Our approach offers a more targeted and adaptive APE method by strategically learning from common failures and leveraging empirically validated generation strategies for a given dataset. More information is available at https://github.com/yjpark-pub/tbpo_release.
EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop Sorting
Pairwise comparison is often favored over absolute rating or ordinal classification in subjective or difficult annotation tasks due to its improved reliability. However, exhaustive comparisons require a massive number of annotations (O(n²)). Recent work[8] has reduced the annotation burden (O(n log n)) by actively sampling pairwise comparisons using a sorting algorithm. We further improve annotation efficiency by (1) roughly pre-ordering items using the CLIP (Contrastive Language-Image Pre-training) model hierarchically without training, and (2) replacing easy, obvious human comparisons with automated ones. The proposed EZ-Sort first produces a CLIP-based zero-shot pre-ordering, then initializes bucket-aware Elo scores, and finally runs an uncertainty-guided human-in-the-loop MergeSort. We validated our method using datasets from three domains: face-age estimation (FGNET)[10], historical image chronology (DHCI)[14], and retinal image quality assessment (EyePACS)[6]. EZ-Sort reduced human annotation cost by 90.5% compared to exhaustive pairwise comparisons and by 19.8% compared to prior work[8] (at n = 100), while improving or maintaining inter-rater reliability. These results demonstrate that combining CLIP-based priors with uncertainty-aware sampling yields an efficient and scalable solution for pairwise ranking.
LHMformer:Long-Range Historical Memory-Enhanced Transformer for Traffic Forecasting
As a typical task of multivariate time series, traffic forecasting has always held significant application value across various fields. Most existing models are constrained by fixed input lengths, making it difficult to avoid the ambiguity in spatio-temporal features. We revisited the synergistic effects of Transformer-based models and TCN-MLP models in spatio-temporal data forecasting and proposed the Long-Range Historical Memory-Enhanced transformer (LHMformer). The network consists of a transformer module for processing short-term inputs and a historical feature extraction module for processing long-term inputs. Experiments on six real-world traffic datasets show that the proposed method achieves state-of-the-art results. Follow-up experiments demonstrate that the historical feature extraction module is a key component in solving traffic flow forecasting problems.
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten
Currently, various uncertainty quantification methods have been proposed to provide certainty and probability estimates for deep learning models' label predictions. Meanwhile, with the growing demand for the right to be forgotten, machine unlearning has been extensively studied as a means to remove the impact of requested sensitive data from a pre-trained model without retraining the model from scratch. However, the vulnerabilities of such generated predictive uncertainties with regard to dedicated malicious unlearning attacks remain unexplored. To bridge this gap, for the first time, we propose a new class of malicious unlearning attacks against predictive uncertainties, where the adversary aims to cause the desired manipulations of specific predictive uncertainty results. We also design novel optimization frameworks for our attacks and conduct extensive experiments, including black-box scenarios. Notably, our extensive experiments show that our attacks are more effective in manipulating predictive uncertainties than traditional attacks that focus on label misclassifications, and existing defenses against conventional attacks are ineffective against our attacks.
GLCN: Treatment Effect Estimation via Global-Local Networks with Adversarial Debiasing
Estimating Individual Treatment Effects (ITE) from observational data has been widely applied across various domains. The challenge lies in the fact that observational data only includes outcomes under one treatment, while potential outcomes under different treatments need to be inferred. We propose GLCN, a novel causal effect estimation neural network that fuses global-local modeling. The Global branch, trained on the entire dataset, captures overall causal relationships to ensure estimation stability, while the Local branch adopts a matching approach to borrow outcomes from neighboring instances, thereby capturing local heterogeneity. A gating network dynamically integrates predictions from both branches through an adaptive weighting mechanism to enhance adaptability to local variations. A key difficulty in the Local branch lies in defining a reasonable distance metric for neighboring instances. To address this, our method employs a match-and-reconstruct strategy, where the reconstruction error serves as a supervisory signal to guide learning. To mitigate confounding bias, we introduce a confusion loss based on adversarial training. Extensive experiments on public benchmarks and real-world industrial datasets demonstrate that our method outperforms state-of-the-art approaches.
LLMCE: Adapting LLMs with Adversarial Debiasing for Counterfactual Estimation over Time
Causal inference in time series data is a challenging yet crucial task in real-world applications such as healthcare and economics, where time-varying confounders often complicate causal effect estimation. In this work, we propose LLMCE, a novel approach that leverages frozen LLMs for counterfactual estimation in time series settings. First, we adapt LLMs to time series by encoding time-varying covariates, past outcomes, and past and future treatments using a reprogramming mechanism. This aligns time series with textual modalities, enabling LLMs to generalize efficiently for time series analysis. Second, to address time-varying confounders, we introduce an adversarial debiasing strategy. This ensures that learned representations predict outcomes without incorporating incremental information about future treatments beyond what can be inferred from past treatments. This dual objective enhances causal estimate reliability while maintaining predictive accuracy. We evaluate LLMCE on synthetic and real-world datasets, demonstrating superior performance compared to state-of-the-art baselines.
Land Deformation Prediction via Multi-modal Adaptive Association Learning
Accurate land deformation prediction using InSAR (Interferometric Synthetic Aperture Radar) technology is crucial for early warning of geological disasters. However, existing prediction methods face two major challenges: cross-area association bottleneck and inadequate handling of temporal distribution heterogeneity. To address these challenges, we propose Multi-modal Adaptive Association Learning framework (MAAL). For the spatial knowledge transfer challenge, we introduce a cross-area multi-modal association learning module that integrates multi-modal (InSAR and geological text) data to enable knowledge transfer between areas with similar geological characteristics. For temporal distribution heterogeneity, we develop an adaptive evolution stage recognition module that uses distribution routers to identify different temporal patterns, then applies corresponding linear extractors to model the heterogeneous landslide evolution. Experimental validation on 889 hazardous areas demonstrates that MAAL outperforms baselines.
Controlled Feature Interaction Selection for Deep Sparse Networks
Deep sparse networks (DSNs) have demonstrated exceptional performance for nonlinear estimation and feature selection, which is crucial for enhancing predictive performance and interpretability. However, existing methods often overlook feature interactions and lack theoretical guarantees on false discovery rate (FDR), especially under interaction scenarios.To address the above issues, this paper develops a DSN-based knockoffs inference framework for feature interaction selection. Theoretical guarantee can be provided by knockoffs inference for controlling on FDR. Empirical evaluations on synthetic datasets demonstrate the capabilities of our proposal on FDR control and identification of informative interactions.
ThoughtForest-KGQA: A Multi-Chain Tree Search for Knowledge Graph Reasoning
Most multi-hop Knowledge Graph Question Answering (KGQA) methods utilize fixed pruning strategies that, while efficient, critically impair the diversity of answer paths and fail to discover complex or less common correct answers. To address these limitations, this paper introduces ThoughtForest-KGQA, a novel multi-chain tree search algorithm. The method employs a dual-level reinforcement learning framework where a local-level agent optimizes individual reasoning chains by capturing fine-grained semantic details in the knowledge graph. Concurrently, a global-level agent strategically coordinates the simultaneous exploration of multiple chains. Comprehensive evaluations conducted across two distinct KGQA benchmarks reveal that this approach identifies a broader spectrum of correct answers, setting a new state-of-the-art in the field.
Towards Robust Continual Test-Time Adaptation via Neighbor Filtration
Test-Time Adaptation (TTA) aims to adapt an unseen target domain utilizing the unlabeled target data using a pre-trained source model. Continual TTA is a more challenging paradigm that deals with non-stationary environments during the test data adaptation. Most existing continual TTA methods are based on pseudo-labeling, but often (1) rely on overconfident pseudo-labels and (2) remain unstable under continual distribution shifts leading to error accumulation and catastrophic forgetting. To tackle these limitations, we propose Neighbor-Filtration based Continual Test-Time Adaptation (NF-CTTA), a reliable and memory-aware adaptation framework that addresses these challenges. NF-CTTA first calibrates pseudo-labels using class-conditional calibration error to correct over/under-confidence of the model. To further ensure reliability, we introduce an OOD Neighbor Filtration technique that selects a subset of high-confidence samples based on entropy and neighbor similarity, ensuring consistency within the semantic neighborhood. Finally, we propose a priority-guided memory buffer that retains the most informative low-entropy samples for replay, mitigating catastrophic forgetting across evolving test distributions. Extensive experiments across multiple domain shift benchmarks demonstrate that NF-CTTA achieves superior performance and stability compared to existing TTA and CTTA methods. The code is available at: https://github.com/takihasan/NF-CTTA.
Towards Understanding Bias in Synthetic Data for Evaluation
Test collections are crucial for evaluating Information Retrieval (IR) systems. Creating a diverse set of user queries for these collections can be challenging, and obtaining relevance judgments, which indicate how well retrieved documents match a query, is often costly and resource-intensive. Recently, generating synthetic datasets using Large Language Models (LLMs) has gained attention in various applications. While previous work has used LLMs to generate synthetic queries or documents to improve ranking models, using LLMs to create synthetic test collections is still relatively unexplored. Previous work showed that synthetic test collections have the potential to be used for system evaluation, however, more analysis is needed to validate this claim. In this paper, we thoroughly investigate the reliability of synthetic test collections constructed using LLMs, where LLMs are used to generate synthetic queries, labels, or both. In particular, we examine the potential biases that might occur when such test collections are used for evaluation. We first empirically show the presence of such bias in evaluation results and analyse the effects it might have on system evaluation. We further validate the presence of such bias using a linear mixed-effects model. Our analysis shows that while the effect of bias present in evaluation results obtained using synthetic test collections could be significant, for e.g. computing absolute system performance, its effect may not be as significant in comparing relative system performance. Codes and data are available at: https://github.com/rahmanidashti/BiasSyntheticData
Quantum-Amplitude Embedded Adaptation for Parameter-Efficient Fine-Tuning in Large Language Models
Large language models (LLMs) require substantial resources for task-specific adaptation, that motivates the development of parameter-efficient fine-tuning (PEFT) methods. This paper presents quantum-amplitude embedded adaptation (QAA), a novel PEFT framework that logarithmically compresses activation vectors using quantum-amplitude embedding and applies expressive non-linear transformations via parameterized quantum circuits (PQCs). By replacing linear adapters in attention modules with compact quantum modules, QAA achieves high expressivity while drastically reducing the number of trainable parameters. Empirical results demonstrate that QAA performs on par with or better than existing PEFT under constrained memory and compute budgets, highlighting its potential for efficient LLM fine-tuning.
Green by Design: Detecting Environmental Claims in Corporate Web Content
Corporate entities increasingly embed environmental claims in their digital communication to project sustainability awareness. Detecting such claims is critical for regulatory monitoring, corporate accountability, and mitigation of greenwashing practices. Traditional neural network architectures including large language models however, struggle to capture both the complex linguistic structures and the subtle stylistic cues that characterize environmental assertions. In this work, we propose a novel Graph-Augmented Liquid Neural Network (GLNN) architecture for automatic detection of environmental claims in corporate web content. Our approach first models the syntactic and semantic dependencies of text using a Graph Convolutional Network (GCN), while concurrently encoding stylistic features derived from linguistic markers (e.g., LIWC categories) into vector representations. These representations are concatenated and passed into a Liquid Time-Constant (LTC) Network, which provides dynamic adaptability and low-power efficiency by leveraging continuous-time recurrent dynamics. The integration of GCN-based stylistic encoding with LTC networks enables the model to robustly capture both structural dependencies and temporal signal variations inherent in corporate claims, while remaining energy efficient. Extensive experiments on multiple open datasets demonstrate that our model outperforms baseline neural architectures in both accuracy and computational efficiency, highlighting the potential of graph-augmented liquid networks as a foundation for sustainable AI in sustainability monitoring.
Towards Equitable Coreset Selection: Addressing Challenges Under Class Imbalance
Coreset selection reduces training cost by constructing compact, representative subsets, but existing methods largely assume balanced class distributions. Under imbalance, this assumption yields biased subsets that discard critical minority samples and degrade accuracy. We propose Equitable Coreset Selection (ECS), a framework tailored for imbalanced data. ECS mitigates these issues through adaptive pruning that preserves minority examples, class-sensitive partitioning aligned with skewed class distributions, and stratified graph-cut selection for diverse sampling. Experiments across multiple imbalanced datasets show that ECS improves generalization and substantially boosts minority-class accuracy compared to standard coreset methods.
Open-Source LLM-based Relevance Assessment vs. Highly Reliable Manual Relevance Assessment: A Case Study
There is currently a controversy as to whether LLM-based relevance assessment can replace manual relevance assessment for evaluating search engines accurately at least at the run level (e.g., ranking TREC runs by mean nDCG) if not at the individual topic level (e.g., computing an nDCG score for a Search Engine Result Page). This study utilises an NTCIR web search test collection that features highly reliable human relevance labels (reflecting the collective view of eight independent assessors per topic) to complement prior findings from the skeptic camp. Our experiments show that LLM-based assessment (using Llama and Qwen) cannot replace human assessment even for ranking systems in terms of mean nDCG. More importantly, LLM-based assessment lacks discriminative power: it misses many statistically significant differences that manual assessment can detect. Furthermore, LLM-based assessment occasionally yields potential false alarms in terms of statistical significance, which may let researchers reach incorrect conclusions.
LLM-as-a-Judge in Entity Retrieval: Assessing Explicit and Implicit Relevance
Entity retrieval plays a critical role in information access systems, yet the development and evaluation of retrieval models remain constrained by the limited availability of high-quality supervision. While recent work has demonstrated the utility of large language models (LLMs) as relevance assessors in passage and document retrieval, their reliability in the context of entity retrieval-where targets are abstract, underspecified, and often semantically sparse-remains unexplored. In this work, we evaluate LLM-based judgments against two complementary supervision signals: human-annotated relevance labels from the DBpedia-Entity benchmark and implicit feedback from user clicks in the LaQuE dataset. We show that LLMs exhibit strong agreement with expert annotations and replicate user click patterns with over 91% agreement, suggesting alignment with behavioral judgments despite noisy input queries. We further identify and analyze systematic mismatches for user clicks on irrelevant entities. Our findings establish LLMs not only as effective annotators for entity relevance judgment-even when given only the entity title-but also as powerful tools for predicting click-through behavior and simulating explainable user intent. Our code, prompts, and data are publicly available at: https://github.com/17shiraz/ClickLLM
CondFairGen A Fair Conditional Generator for Tabular Data via Adaptive Sampling
Recent advances in synthetic data generation have enabled high-fidelity modeling of tabular datasets, yet fairness remains a peripheral concern, often addressed through architectural modifications or fairness-aware loss functions. We introduce CondFairGen, a fairness-aware generative model that enforces group fairness through dynamic control of conditional exposure during training. Rather than altering the model architecture or objective, CondFairGen reweights the sampling distribution over conditioning vectors based on disparity metrics across protected attributes and their intersections. This reweighting increases exposure to underrepresented or high-disparity subgroups, guiding the model toward fairer conditional distributions. By embedding fairness directly into the training schedule, CondFairGen offers a principled alternative to adversarial debiasing or post hoc correction. Empirical evaluations on standard tabular benchmarks demonstrate that CondFairGen substantially improves both marginal and intersectional fairness metrics while preserving downstream utility. These results establish conditional exposure as a practical and effective mechanism for fairness intervention in generative modeling.
HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers
Leveraging both labeled (input-output associations) and unlabeled data (wider contextual grounding) may provide complementary benefits in retrieval augmented generation (RAG). However, effectively combining evidence from these heterogeneous sources is challenging as the respective similarity scores are not inter-comparable. Additionally, aggregating beliefs from the outputs of multiple rankers can improve the effectiveness of RAG. Our proposed method first aggregates the top-documents from a number of IR models using a standard rank fusion technique for each source (labeled and unlabeled). Next, we standardize the retrieval score distributions within each source by applying z-score transformation before merging the top-retrieved documents from the two sources. We evaluate our approach on the fact verification task, demonstrating that it consistently improves over the best-performing individual ranker or source and also shows better out-of-domain generalization.
Effect of Model Merging in Domain-Specific Ad-hoc Retrieval
In this study, we evaluate the effect of model merging in ad-hoc retrieval tasks. Model merging is a technique that combines the diverse characteristics of multiple models. We hypothesized that applying model merging to domain-specific ad-hoc retrieval tasks could improve retrieval effectiveness. To verify this hypothesis, we merged the weights of a source retrieval model and a domain-specific (non-retrieval) model using a linear interpolation approach. A key advantage of our approach is that it requires no additional fine-tuning of the models. We conducted two experiments each in the medical and Japanese domains. The first compared the merged model with the source retrieval model, and the second compared it with a LoRA fine-tuned model under both full and limited data settings for model construction. The experimental results indicate that model merging has the potential to produce more effective domain-specific retrieval models than the source retrieval model, and may serve as a practical alternative to LoRA fine-tuning, particularly when only a limited amount of data is available.
cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending
Many multi-agent reinforcement learning (MARL) algorithms are trained in fixed simulation environments, making them brittle when deployed in real-world scenarios with more complex and uncertain conditions. Contextual MARL (cMARL) addresses this by parameterizing environments with context variables and training a context-agnostic policy that performs well across all environment configurations. Existing cMARL methods attempt to use curriculum learning to help train and evaluate context-agnostic policies, but they often rely on unreliable proxy signals, such as value estimates or generalized advantage estimates that are noisy and unstable in multi-agent settings due to inter-agent dynamics and partial observability. To address these issues, we propose Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending (cMALC-D), a framework that uses Large Language Models (LLMs) to generate semantically meaningful curricula and provide a more robust evaluation signal. To prevent mode collapse and encourage exploration, we introduce a novel diversity-based context blending mechanism that creates new training scenarios by combining features from prior contexts. Experiments in traffic signal control domains demonstrate that cMALC-D improves both generalization and sample efficiency compared to existing curriculum learning baselines.
Membership Inference Attack Vulnerabilities of Record Linkage Models
Record linkage plays a crucial role in integrating health, legal, and administrative data. In domains such as healthcare, unstructured records, including clinical notes, contain rich information, making their integration valuable for applications like clinical trials. While deep learning models have improved linkage quality, their privacy risks remain under-explored. We present what is, to our knowledge, the first systematic study of membership inference attack vulnerabilities in record linkage models trained on de-identified texts. Unlike traditional classifiers, linkage models operate on record pairs, prompting a rethinking of what constitutes membership leakage. Does it occur if only a record pair was seen together during training or individually? Or if neither was seen, but the pair resembles known patterns? We introduce a black-box attack based on semantic perturbation sensitivity, requiring no access to the model's internals. Our findings expose a previously unaddressed membership inference vulnerability in record linkage models: black-box attacks, even with a simple threshold-based attack model, achieved up to 92% precision and AUC 0.79. Across record linkage models trained on MIMIC-IV and PMC-Patients datasets, we observe that perturbing training-seen phrases causes significantly larger confidence shifts (e.g., Δ s = -0.648) and higher change in predicted label (label flip from 0 = non-match to 1 = match and vice versa) (e.g., 68.75%) compared to unseen variants. These preliminary results reveal membership inference vulnerabilities in text-based linkage systems, highlighting the need for deeper investigation into privacy risks, motivating new lines of privacy defences for pairwise models.
Do LLMs Dream of Electric Emotions? Towards Quantifying Metacognition and Generalizing the Teacher-Student Model Using Ensembles of LLMs
In this paper, we propose a novel framework for quantifying metacognitive processes in ensembles of Large Language Models (LLMs) and extending the traditional teacher-student model through the lens of dual-process cognitive theory. We introduce a Metacognitive State Vector (MSV) that operationalizes metacognition across five dimensions: emotional response analysis, correctness evaluation, experiential matching, conflicting information estimation, and problem importance task prioritization. In our formulation, the rapid, intuitive thinking of System 1 is mapped onto a smaller ''student'' (single LLM or ensemble of bagged LLMs) while the deliberate, analytical reasoning of System 2 is mapped onto a larger ''teacher'' (ensemble of boosted LLMs) using the MSV. Additionally, we utilize a graph-theoretic architecture to model ensemble interactions, enabling LLMs to assume dynamic roles and transition between System 1 and System 2 for improved decision-making. We view this work as a first step towards establishing a true measure of emergent metacognition in such systems.
FAIR Data Assessment Using LLMs: The Fair-Way
As part of modern research practices, the FAIR data principles have become essential for data discoverability, usability, and sharing. Existing implementations for automatically assessing FAIR adherence (FAIRness) often suffer from limited usability, inconsistent accuracy, and difficult-to-interpret results, as they require explicit rules to cover for specific FAIR assessment frameworks, which are not easy to generalize. This paper introduces Fair-Way, an open source tool that leverages Large Language Models (LLMs) to automate FAIRness assessment. Fair-Way applies a divide-and-conquer approach to decompose the assessment process into fine-grained tasks, as well as to split the metadata into manageable chunks. Evaluation demonstrates that Fair-Way achieves performance comparable to existing tools, while outperforming them in several key metrics. Moreover, Fair-Way generalizes across FAIR assessment indicators without requiring explicitly programmed logic and supports both structured and unstructured metadata in diverse formats. Finally, it enables user-defined, domain-specific tests, which are typically not supported by other systems. Overall, Fair-Way represents a scalable and flexible solution to accelerate FAIR data practices across research domains.
The Metadata Impedance Mismatch between Databases and Programming Languages
This paper identifies a problem with databases that support metadata. Previous research has proposed annotating values stored in a database with metadata, such as time, security, privacy, and quality. The metadata influences how a value is used. For example, sequenced temporal semantics proscribes comparing a value to one alive at a different time. But when values stored in a database are pulled into the realm of a programming language from a database through an API, web service, or in a user-defined function, a step-down transformation of the data occurs. The transformation strips the metadata, changing the semantics of the value. The metadata is discarded because a programming language process a scalar value, not one annotated with metadata. This metadata-related impedance mismatch between databases and programming languages limits the real-world adoption of databases that support metadata.
LLM4ES: Learning User Embeddings from Event Sequences via Large Language Models
This paper presents LLM4ES, a novel framework that exploits large pre-trained language models (LLMs) to derive user embeddings from event sequences. Event sequences are transformed into a textual representation, which is subsequently used to fine-tune an LLM through next-token prediction to generate high-quality embeddings. We introduce a text enrichment technique that enhances LLM adaptation to event sequence data, improving representation quality for low-variability domains. Experimental results demonstrate that LLM4ES achieves state-of-the-art performance in user classification tasks in financial and other domains, outperforming existing embedding methods. The resulting user embeddings can be incorporated into a wide range of applications, from user segmentation in finance to patient outcome prediction in healthcare.
Company-Specific Knowledge Matters: Retrieval-Augmented Generation for Earnings Call Answer Rehearsal
Retrieval-augmented generation (RAG) has long been used to guide generative models in producing more accurate answers, with most discussions focusing on reading comprehension-based question answering (QA). However, their role in real-world answer rehearsal scenarios remains underexplored. The rise of large language models (LLMs) presents new opportunities to develop systems that assist professionals, making this discussion both timely and essential. This paper explores how to better support corporate executives in answering questions from professional analysts during earnings calls. We compare the impact of two external knowledge sources: large-scale causal knowledge graphs (KGs) and historical Q&A records-retrieved either from a global pool or company-specific archives. Our findings suggest that a company's historical Q&A records are more influential than causal KGs in improving response quality. To the best of our knowledge, this is the first study to systematically compare and analyze different knowledge resources in answer rehearsal. Our findings show the potential of inspiring future research on the interplay between KG and historical QA pairs for answer rehearsal.
Zero-shot Stroke Lesion Segmentation via CAM-guided Prompting of MedSAM2
Accurate segmentation of stroke lesions in diffusion-weighted imaging (DWI) is crucial for clinical decision-making. However, automated infarct segmentation remains challenging due to variable infarct sizes and locations, and it is labor-intensive, requiring expert manual annotations for training. We propose a zero-shot framework to eliminate the need for manual segmentation labels by leveraging weak supervision from class activation maps (CAMs) to guide segmentation using MedSAM2, a foundation model for 3D medical image segmentation. By extracting attention maps from a fine-tuned ResNet on DWI scans labeled with stroke etiology (cause) and combining them with intensity information, we identify key regions and generate bounding-box prompts for MedSAM2. Our method achieves a Dice score of 54.2 ± 5.3% without any manual segmentation labels or tuning of the MedSAM2 model, demonstrating its potential as a scalable solution for reliable pseudo-label generation.
On Influence Tail Bounds in Online Social Networks
The influence estimation and maximization problems study the expected reach of a seed set in social networks under a stochastic propagation model. Motivated by the practical utility of characterizing the distribution of reach values, we systematically analyze the tail behaviour of the reach of a seed set. We study tail bound query problems that, for a given seed set, compute either the maximum reach for a given probability threshold or the highest probability of achieving a target reach. We prove #P-hardness and propose algorithms that balance efficiency and accuracy. We also examine tail bound optimization problems that find a seed set maximizing reach for a target probability or maximizing the probability of achieving a target reach, and establish strong inapproximability results. Experiments on real datasets demonstrate the effectiveness of our algorithms, showing that good approximations of the actual reach for a desired probability can be computed efficiently and that the actual reach can be very different from the expected reach.
Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language Models
Continual learning (CL) enables models to adapt to evolving data streams without catastrophic forgetting, a fundamental requirement for real-world AI systems. However, the current methods often depend on large replay buffers or heavily annotated datasets which are impractical due to storage, privacy, and cost constraints. We propose CLTS (Continual Learning via Text-Image Synergy), a novel class-incremental framework that mitigates forgetting without storing real task data. CLTS leverages pre-trained vision-language models, BLIP (Bootstrapping Language-Image Pre-training) for caption generation and stable diffusion for sample generation. Each task is handled by a dedicated Task Head, while a Task Router learns to assign inputs to the correct Task Head using the generated data. On three benchmark datasets, CLTS improves average task accuracy by up to 54% and achieves 63 times better memory efficiency compared to four recent continual learning baselines, demonstrating improved retention and adaptability. CLTS introduces a novel perspective by integrating generative text-image augmentation for scalable continual learning.
Filtered One-Shot Training for Quantum Architecture Search
Quantum neural networks (QNNs) have attracted growing interest for their potential to accelerate computation and leverage quantum supremacy. Their performance largely depends on gate placement within parameterized quantum circuits (PQCs), making neural architecture search (NAS) a suitable approach for discovering efficient structures. However, applying conventional NAS to QNNs is hindered by barren plateaus, noise, hardware constraints, and high computational costs. To address these challenges, this paper proposes filtered one-Shot training for quantum architecture search, which combines deep reinforcement learning (DRL) for constraint-aware gate placement with a one-shot supernet for efficient weight sharing. A filtering mechanism further removes weak paths to narrow the search space. Experimental results show that the method reduces parameters by up to 76% while maintaining high accuracy.
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
Vision-Language Pretrained (VLP) models have achieved impressive performance on multimodal tasks, including text-image retrieval, based on dense representations. Meanwhile, Learned Sparse Retrieval (LSR) has gained traction in text-only settings due to its interpretability and efficiency with fast term-based lookup via inverted indexes. Inspired by these advantages, recent work has extended LSR to the multimodal domain. However, these methods often rely on computationally expensive contrastive pre-training, or distillation from a frozen dense model, which limits the potential for mutual enhancement. To address these limitations, we propose a simple yet effective framework that enables bi-directional learning between dense and sparse representations through Self-Knowledge Distillation. This bi-directional learning is achieved using an integrated similarity score-a weighted sum of dense and sparse similarities-which serves as a shared teacher signal for both representations. To ensure efficiency, we fine-tune the final layer of the dense encoder and the sparse projection head, enabling easy adaptation of any existing VLP model. Experiments on MSCOCO and Flickr30k demonstrate that our sparse retriever not only outperforms existing sparse baselines, but also achieves performance comparable to-or even surpassing-its dense counterparts, while retaining the benefits of sparse models.
Evaluating Differentially Private Generation of Domain-Specific Text
Generative AI offers transformative potential for high-stakes domains such as healthcare and finance, yet privacy and regulatory barriers hinder the use of real-world data. To address this, differentially private synthetic data generation has emerged as a promising alternative. In this work, we introduce a unified benchmark to systematically evaluate the utility and fidelity of text datasets generated under formal Differential Privacy (DP) guarantees. Our benchmark addresses key challenges in domain-specific benchmarking, including choice of representative data and realistic privacy budgets, accounting for pre-training and a variety of evaluation metrics. We assess state-of-the-art privacy-preserving generation methods across five domain-specific datasets, revealing significant utility and fidelity degradation compared to real data, especially under strict privacy constraints. These findings underscore the limitations of current approaches, outline the need for advanced privacy-preserving data sharing methods and set a precedent regarding their evaluation in realistic scenarios.
Random-Feature Graph Neural Networks with Representation Tokenized Transformer for Robust One-class Anomaly Detection
Real-world anomaly detection seldom enjoys abundant, perfectly curated normal data; labels are often limited and untrustworthy, allowing genuine outliers to masquerade as benign. We cast one-class anomaly detection into this harsh setting of label scarcity and noise and present Random-Feature Graph Neural Networks with a Representation Tokenized Transformer (RFGT). RFGT first partitions the feature space into random, non-overlapping subsets, yielding multiple complementary views that dilute the impact of any corrupted dimension. With a graph constructed from each view, a graph learner propagates the sparse normal signal to unlabeled neighbors while tempering mislabeled anomalies. A novel Representation Tokenized Transformer module is learned to capture cross-feature dependencies and implicitly down-weight inconsistent signals within a data instance. Extensive experiments on eight tabular benchmarks exhibit that RFGT outperforms the state-of-the-art as the amount of clean data shrinks or the label noise grows.
NuFact: Validating Numerical Assertions for Knowledge Graphs
Validating extracted assertions is one of the crucial steps for curating knowledge graphs (KGs). While existing research extensively explores methods for validating KG assertions, specifically for the categorical facts -- where both the Subject and Object in triples are entities, there remains a significant gap in the validation of numerical assertions, where the Object represents a quantity. Moreover, general fact-validation methods are inefficient for validating numerical claims due to their limited coverage in KGs. Furthermore, large-language models (LLMs) exhibit limitations in quantitative reasoning, further exacerbating the challenge. These gaps compromise the reliability of KGs in applications that require precise numerical accuracy. Addressing these challenges, we propose NuFact, a framework for validating numerical assertions using evidence gathered from the web. NuFact combines the rich contextual understanding of LLMs with manually crafted quantity-focused and temporal features derived from extracted evidence to assess numerical claims. Experimental evaluations show that NuFact significantly outperforms existing fact-checking baselines and popular LLM-powered agents.
Frozen in the Middle: Hidden States Remain Unchanged Across Intermediate Layers of Language Models
This paper investigates the internal mechanisms of large language models (LLMs) through the lens of Mechanistic Interpretability (MI). We present novel findings on how information is processed and propagated within these models. Our key contributions include: (1) providing evidence for the localized nature of fact storage and information propagation from subject tokens; (2) introducing a new observation that hidden states remain largely unchanged across multiple middle layers, which we call the ''plateau'' phenomenon; and (3) developing a manually crafted diagnostic dataset of factual prompts. Our work complements and extends prior research on transformer information flow by demonstrating that, contrary to the prevailing assumption of sequential representation enrichment across layers, subject token states stabilize early and remain functionally static throughout multiple middle layers while containing all necessary information for the final prediction. These insights advance our understanding of how transformers process factual information and suggest a more complex pattern of layer specialization than previously identified.
A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models
Serendipity in recommender systems (RSs) has attracted increasing attention as a concept that enhances user satisfaction by presenting unexpected and useful items. However, evaluating serendipitous performance remains challenging because its ground truth is generally unobservable. The existing offline metrics often depend on ambiguous definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability. To address this issue, we propose a universally applicable evaluation framework that leverages large language models (LLMs) known for their extensive knowledge and reasoning capabilities, as evaluators. First, to improve the evaluation performance of the proposed framework, we assessed the serendipity prediction accuracy of LLMs using four different prompt strategies on a dataset containing user-annotated serendipitous ground truth and found that the chain-of-thought prompt achieved the highest accuracy. Next, we re-evaluated the serendipitous performance of both serendipity-oriented and general RSs using the proposed framework on three commonly used real-world datasets, without the ground truth. The results indicated that there was no serendipity-oriented RS that consistently outperformed across all datasets, and even a general RS sometimes achieved higher performance than the serendipity-oriented RS.
Densest Subgraph Discovery on Decentralized Graphs with Local Edge Differential Privacy
Various real-world graphs, such as social and transaction networks, are typically distributed across users, each of whom holds a local view of the graph (i.e., their own relationships with others). Densest Subgraph Discovery (DSD) on such decentralized graphs is a fundamental task that can uncover valuable insights for downstream applications, including fraud detection, community identification, and user behavior mining. Additionally, in many scenarios, due to privacy concerns, sensitive original local views cannot be collected for DSD. Although there have been extensive studies on DSD, most existing algorithms either do not take user privacy into account or are specific to the centralized privacy setting that requires a (trusted) curator to collect all local views from users and then analyze the entire graph privately. To address these issues, we consider DSD under Local Edge Differential Privacy (LEDP), which allows users to perturb their local graph views via a randomizer such that any single edge is indistinguishable from the data transmitted to the server. We propose a new LEDP algorithm for DSD that utilizes the Randomized Response (RR) mechanism for user-side perturbation and extends greedy peeling with degree correction to find the densest subgraph on the server-side noisy global graph. Our proposed algorithm provides provable privacy and approximation guarantees. Finally, we perform experimental evaluations on real-world graphs to show that our proposed algorithm achieves better privacy-utility trade-offs than state-of-the-art LEDP baselines for DSD.
Channel-Independent Refiner for Multivariate Time Series Forecasting
Real-world time series data are usually multivariate with complex channel relations. Some channels are highly related, while others are with limited correlation. Channel independence has shown great significance in multivariate time series forecasting to capture better individual channel characteristics. In order to learn channel patterns better, we propose a channel-independent Refiner as a plug-and-play module. Specifically, we devise a channel-wise Refiner connected to the output of existing methods. Leveraging the concatenation of original input and the coarse prediction from the base model, the Refiner produces a better estimation. Our Refiner also benefits from the channel-independent design and a post-training strategy, achieving significant improvement over base models. Extensive experiments on iTransformer, FEDformer, Autoformer, FreTS, DLinear and TSMixer demonstrate that our Refiner reduces forecasting errors on both Transformer-based and MLP-based models in over 90% of the experimental settings. Our Refiner combined with iTransformer establishes the new state-of-the-art.
Social Relation Meets Recommendation: Augmentation and Alignment
Recommender systems are essential for modern content platforms, yet traditional behavior-based models often struggle with cold users who have limited interaction data. Engaging these users is crucial for platform growth. To bridge this gap, we propose leveraging the social-relation graph to enrich interest representations from behavior-based models. However, extracting value from social graphs is challenging due to relation noise and cross-domain inconsistency. To address the noise propagation and obtain accurate social interest, we employ a dual-view denoising strategy, employing low-rank SVD to the user-item interaction matrix for a denoised social graph and contrastive learning to align the original and reconstructed social graphs. Addressing the interest inconsistency between social and behavioral interests, we adopt a ''mutual distillation'' technique to isolate the original interests into aligned social/behavior interests and social/behavior specific interests, maximizing the utility of both. Experimental results on widely adopted industry datasets verify the method's effectiveness, particularly for cold users, offering a fresh perspective for future research. The implementation can be accessed at https://github.com/WANGLin0126/CLSRec.
Latent Graph Structure Learning for Large-Scale Traffic Forecasting
Large-scale traffic forecasting poses new challenges in model architecture design due to the large data volume and high computation complexity. An effective solution is to partition traffic observation locations into several patches on which the computation burden decreases from node to patch level. However, since traffic states on road network are highly dynamic, current static and hard partition methods can not fully adapt to locations whose traffic semantics and evolving patterns vary across time. Therefore, we suppose locations on road network latently belong to certain graph structures and propose an adaptive and dynamic patching method in data-driven fashion. Besides, we design an end-to-end framework to learn patch assignments and predict future traffic states simultaneously. Experiments on real world large-scale traffic datasets further verify the effectiveness and interpretability of our proposed method.
DAGP: Difficulty-Aware Graph Pruning for LLM-Based Multi-Agent System
Large Language Model-based multi-agent systems demonstrate strong capabilities across different tasks. However, current methods often rely on static or task-specific designs. These instance-agnostic methods overlook varying instance complexities, leading to redundant communication in simple scenarios and insufficient coordination in demanding ones. Consequently, they failed to achieve effective and efficient collaborative reasoning among agents. To overcome these limitations, we propose Difficulty-Aware Graph Pruning (DAGP), an adaptive framework that configures communication structures based on instance-specific difficulty. DAGP integrates a difficulty estimation module and a sparsity control mechanism to selectively activate communication edges based on each instance, promoting efficient and targeted collaboration. Empirical evaluations across diverse benchmarks demonstrate that DAGP consistently achieves state-of-the-art performance compared to other baselines while reducing average token usage by 45%.
Unified Robustness via Spurious-Invariant Features and On-Manifold Adversaries
Vision models fail both under tiny pixel attacks and under real-world shifts in style or background because they latch onto spurious features. We propose a two-step, label-free method. (1) Spurious-Invariant Self-Supervised Pre-training (SISSP) trains an encoder to collapse representations of the same object despite randomized styles and backgrounds, pruning shortcut signals. (2) Semantic-Alignment Adversarial Refinement (SAAR) takes any attack and projects it back into a small ball within SISSP feature space, yielding adversaries that look natural yet still fool the classifier. Fine-tuning with SISSP features and SAAR images produces a ResNet-50 that retains 64% ImageNet accuracy, 46% PGD robustness without environment labels or specialized augmentations. Together, SISSP provides a semantics-aware metric and SAAR generates on-manifold adversaries, achieving the first ImageNet-scale model robust to both pixel-level noise and semantic shifts.
GenR1-Searcher: Curriculum Reinforcement Learning for Dynamic Retrieval and Document Generation
Current RAG approaches follow two paradigms with complementary limitations: retrieve-then-read methods access reliable sources but produce noisy and incomplete information, while generate-then-read approaches create query-aligned documents but suffer from hallucinations. We propose GenR1-Searcher, a curriculum-based reinforcement learning framework that enables small language models to intelligently decide between retrieval and document generation during multi-hop reasoning through a three-stage progressive training strategy: first learning tool invocation syntax through format rewards, then mastering retrieval strategies with answer-based rewards, and finally acquiring adaptive tool selection capabilities when both knowledge sources are available. Extensive experiments on four multi-hop QA benchmarks demonstrate that GenR1-Searcher consistently outperforms strong baselines, achieving substantial relative improvements of 26.8%, 21.2%, and 15.7% on HotpotQA, 2WikiMultiHopQA, and MuSiQue respectively, outperforming competitive baselines including Search-o1, Search-R1, and ReARTeR, with analysis revealing that our model learns principled tool selection strategies that adapt based on tool capabilities and query characteristics.
Temporal-Aware User Behaviour Simulation with Large Language Models for Recommender Systems
Large Language Models (LLMs) demonstrate human-like capabilities in language understanding, reasoning, and generation, driving interest in using LLM-based agents to simulate human feedback in recommender systems. However, most existing approaches rely on static user profiling, neglecting the temporal and dynamic nature of user interests. This limitation stems from a disconnect between language modelling and behaviour modelling, which constrains the capacity of agents to represent sequential patterns. To address this challenge, we propose a Dynamic Temporal-aware Agent-based simulator for Recommender Systems, DyTA4Rec, which enables agents to model and utilise evolving user behaviour based on historical interactions. DyTA4Rec features a dynamic updater for real-time profile refinement, temporal-enhanced prompting for sequential context, and self-adaptive aggregation for coherent feedback. Experimental results at group and individual levels show that DyTA4Rec significantly improves the alignment between simulated and actual user behaviour by modelling dynamic characteristics and enhancing temporal awareness in LLM-based agents.
Can Large Vision-Language Models Understand Multimodal Sarcasm?
Sarcasm is a complex linguistic phenomenon that involves a disparity between literal and intended meanings, making it challenging for sentiment analysis and other emotion-sensitive tasks. While traditional sarcasm detection methods primarily focus on text, recent approaches have incorporated multimodal information. However, the application of Large Visual Language Models (LVLMs) in Multimodal Sarcasm Analysis (MSA) remains underexplored. In this paper, we evaluate LVLMs in MSA tasks, specifically focusing on Multimodal Sarcasm Detection and Multimodal Sarcasm Explanation. Through comprehensive experiments, we identify key limitations, such as insufficient visual understanding and a lack of conceptual knowledge. To address these issues, we propose a training-free framework that integrates in-depth object extraction and external conceptual knowledge to improve the model's ability to interpret and explain sarcasm in multimodal contexts. The experimental results on multiple models show the effectiveness of our proposed framework. The code is available at https://github.com/cp-cp/LVLM-MSA.
Empirical Analysis on User Profile in Personalized LLMs
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs is unclear. This study first confirms that the effectiveness of user profiles stems primarily from their personalization information, with input-relevant information contributing meaningfully only when built upon personalization. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate more user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, user profiles closer to the beginning have more impact on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance to leverage user profiles effectively.
Fact or Facsimile? Evaluating the Factual Robustness of Modern Retrievers
Dense retrievers and rerankers are central to retrieval-augmented generation (RAG) pipelines, where accurately retrieving factual information is crucial for maintaining system trustworthiness and defending against RAG poisoning. However, little is known about how much factual competence these components inherit or lose from the large language models (LLMs) they are based on. We pair 12 publicly released embedding checkpoints with their original base LLMs and evaluate both sets on a factuality benchmark. Across every model evaluated, the embedding variants achieve markedly lower accuracy than their bases, with absolute drops ranging from 12 to 43 percentage points (median 28 pts) and typical retriever accuracies collapsing into the 25-35 % band versus the 60-70 % attained by the generative models. This degradation intensifies under a more demanding condition: when the candidate pool per question is expanded from four options to one thousand, the strongest retriever's top-1 accuracy falls from 33 % to 26 %, revealing acute sensitivity to distractor volume. Statistical tests further show that, for every embedding model, cosine-similarity scores between queries and correct completions are significantly higher than those for incorrect ones (p < 0.01), indicating decisions driven largely by surface-level semantic proximity rather than factual reasoning. To probe this weakness, we employed GPT-4.1 to paraphrase each correct completion, creating a rewritten test set that preserved factual truth while masking lexical cues, and observed that over two-thirds of previously correct predictions flipped to wrong, reducing overall accuracy to roughly one-third of its original level. Taken together, these findings reveal a systematic trade-off introduced by contrastive learning for retrievers: gains in semantic retrieval are paid for with losses in parametric factual knowledge, and the resulting models remain highly vulnerable to adversarial or even benign rephrasings. Our study underscores the need for retrieval objectives that balance similarity with factual fidelity to safeguard next-generation RAG systems against both misinformation and targeted attacks.
Dual Context-Aware Negative Sampling Strategy for Graph-based Collaborative Filtering
Negative sampling plays a critical role in collaborative filtering (CF), as it accelerates convergence and improves recommendation accuracy. Among recent studies, mixup-based negative sampling has shown promising performance. However, existing methods primarily focus on increasing the similarity between the synthesized negative and the positive item, without considering the false positive issue commonly found in implicit feedback scenarios. Blindly training all positive samples with overly hard negatives can magnify the impact of false positives and hurt recommendation performance. To address this challenge, we first provide a theoretical analysis revealing that mixup-synthesized hard negatives implicitly reweight the similarity difference between the user's interactions and both the positive and negative boundaries, thereby shaping the training signal. Motivated by this, we propose a novel strategy named Dual Context-Aware Negative Sampling (DCANS), which enhances each positive item by assessing its alignment with the user's interest context, and simultaneously adjusts the hardness of synthesized negatives based on their relevance to the same interest context. This strategy optimizes the training direction toward the user's genuine preferences, mitigating the negative impact of false positives while preserving the benefits of hard negative sampling. Extensive experiments on three benchmark datasets demonstrate that our method achieves consistent improvements over state-of-the-art baselines. Our PyTorch implementation is available https://github.com/Wu-Xi/DCANS.
Harnessing Light for Cold-Start Recommendations: Leveraging Epistemic Uncertainty to Enhance Performance in User-Item Interactions
Most recent paradigms of generative model-based recommendation still face challenges related to the cold-start problem. Existing models addressing cold item recommendations mainly focus on acquiring more knowledge to enrich embeddings or model inputs. However, many models do not assess the efficiency with which they utilize the available training knowledge, leading to the extraction of significant knowledge that is not fully used, thus limiting improvements in cold-start performance. To address this, we introduce the concept of epistemic uncertainty (which refers to uncertainty caused by a lack of knowledge of the best model) to indirectly define how efficiently a model uses the training knowledge. Since epistemic uncertainty represents the reducible part of the total uncertainty, we can optimize the recommendation model further based on epistemic uncertainty to improve its performance. To this end, we propose a Cold-Start Recommendation based on Epistemic Uncertainty (CREU) framework. Additionally, CREU is inspired by Pairwise-Distance Estimators (PaiDEs) to efficiently and accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The proposed method is evaluated through extensive offline experiments on public datasets, which further demonstrate the advantages and robustness of CREU. The source code is available at https://github.com/EsiksonX/CREU.
A Soft-partitioned Semi-supervised Collaborative Transfer Learning Approach for Multi-Domain Recommendation
n industrial practice, Multi-domain Recommendation (MDR) plays a crucial role. Shared-specific architectures are widely used in industrial solutions to capture shared and unique attributes via shared and specific parameters. However, with imbalanced data across different domains, these models face two key issues: (1) Overwhelming: Dominant domain data skews model performance, neglecting non-dominant domains. (2) Overfitting: Sparse data in non-dominant domains leads to overfitting in specific parameters. To tackle these challenges, we propose Soft-partitioned Semi-supervised Collaborative Transfer Learning (SSCTL) for multi-domain recommendation. SSCTL generates dynamic parameters to address the overwhelming issue, thus shifting focus towards samples from non-dominant domains. To combat overfitting, it leverages pseudo-labels with weights from dominant domain instances to enhance non-dominant domain data. We conduct comprehensive experiments, both online and offline, to validate the efficacy of our proposed method. Online tests yielded significant improvements across various domains, with increases in GMV ranging from 0.54% to 2.90% and enhancements in CTR ranging from 0.22% to 1.69%.
Exploring the Potential of Pre-Trained Language Models in Long-Term Semantic Scene Change Prediction Using Variable Scene Graphs
The 3D Variable Scene Graph (3DVSG) is a newly emerging representation for modeling dynamic environments, extending scene graphs by introducing a node-level property called variability, which quantifies the likelihood of semantic change over time. In this work, we explore the integration of pre-trained language models (PLMs) into variability estimation. This is of significant practical importance because variability estimation suffers from data scarcity and severe class imbalance. PLMs provide a rich general semantic knowledge that can enhance representation learning in such settings. We systematically evaluate PLM embeddings across different graph neural networks (GNNs). We introduce a template-based text structuring (TTS) to understand the effect of input formatting. Our experiments show that PLM embeddings significantly improve variability estimation performance, with effectiveness influenced by both embedding and GNN choices. Also, we demonstrate that text structure can significantly affect embedding quality. Lastly, we demonstrate that PLM embeddings yield reliable gains in variability estimation and downstream active change detection.
Enhancing Graph Collaborative Filtering with FourierKAN Feature Transformation
Graph Collaborative Filtering (GCF) has emerged as a dominant paradigm in modern recommendation systems, excelling at modeling complex user-item interactions and capturing high-order collaborative signals. Most existing GCF models predominantly rely on simplified graph architectures like LightGCN, which strategically remove feature transformation and activation functions from vanilla graph convolution networks. Through systematic analysis, we reveal that feature transformation in message propagation can enhance model representation, though at the cost of increased training difficulty. To this end, we propose FourierKAN-GCF, a novel framework that adopts Fourier Kolmogorov-Arnold Networks as efficient transformation modules within graph propagation layers. This design enhances model representation while decreasing training difficulty. Our FourierKAN-GCF can achieve higher recommendation performance than most widely used GCF backbone models and can be integrated into existing advanced self-supervised models as a backbone, replacing their original backbone to achieve enhanced performance. Extensive experiments on three public datasets demonstrate the superiority of FourierKAN-GCF.
Multi-Item-Query Attention for Stable Sequential Recommendation
The inherent instability and noise in user interaction data challenge sequential recommendation systems. Prevailing masked attention models, relying on a single query from the most recent item, are sensitive to this noise, reducing prediction reliability. We propose the Multi-Item-Query attention mechanism (MIQ-Attn) to enhance model stability and accuracy. MIQ-Attn constructs multiple diverse query vectors from user interactions, effectively mitigating noise and improving consistency. It is designed for easy adoption as a drop-in replacement for existing single-query attention. Experiments show MIQ-Attn significantly improves performance on benchmark datasets.
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhance performance on specific tasks and often struggle to generalize to unseen tasks without fine-tuning. To address this limitation, we propose augmenting TSFMs with In-Context Learning (ICL) capabilities, enabling them to perform test-time inference by dynamically adapting to input-output relationships provided within the context. Our framework, In-Context Time-series Pre-training (ICTP), restructures the original pre-training data to equip the backbone TSFM with ICL capabilities, enabling adaptation to unseen tasks. Experiments demonstrate that ICT improves the performance of state-of-the-art TSFMs by approximately 11.4% on unseen tasks without requiring fine-tuning.
Multi-Behavior Intent Disentanglement for Recommendation via Information Bottleneck Principle
In e-commerce, recommender systems help users find suitable products by leveraging diverse behaviors, e.g., view, cart and buy. In recent years, multi-behavior recommender systems have made strides by integrating auxiliary behaviors with purchase histories to deliver high-quality recommendations. However, most existing methods often fail to identify spurious correlation intents within auxiliary behaviors that conflict with users' target intents. Indiscriminately incorporating such correlations into the prediction of target intents may lead to performance degradation. Toward this end, we propose a Multi-Behavior Intent Disentanglement (MBID) framework based on Information Bottleneck (IB) principle, which focuses on disentangling spurious correlation intents in multi-behavior recommendations. In particular, we design a projection-based intent extraction method to decompose the genuine and spurious correlation intents in auxiliary behaviors. Building on this, we conceive an IB-based multi-intent learning task to disentangle the spurious correlation intents and transfer the genuine correlation intents from auxiliary behaviors into the target behavior, yielding high-quality target intent representations. Experiments on three real-world datasets show MBID significantly outperforms the state-of-the-art baselines by effectively disentangling the spurious correlation intents.
Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning
Hashtag trends ignite campaigns, shift public opinion, and steer millions of dollars in advertising spend, yet forecasting which tag goes viral is elusive. Classical regressors digest surface features but ignore context, while large language models (LLMs) excel at contextual reasoning but misestimate numbers. We present BuzzProphet, a reasoning-augmented hashtag popularity prediction framework that (1) instructs an LLM to articulate a hashtag's topical virality, audience reach, and timing advantage; (2) utilizes these popularity-oriented rationales to enrich the input features; and (3) regresses on these inputs. To facilitate evaluation, we release HashView, a 7,532-hashtag benchmark curated from social media. Across diverse regressor-LLM combinations, BuzzProphet reduces RMSE by up to 2.8% and boosts correlation by 30% over baselines, while producing human-readable rationales. Results demonstrate that using LLMs as context reasoners rather than numeric predictors injects domain insight into tabular models, yielding an interpretable and deployable solution for social media trend forecasting.
Measuring Uncertainty in Medical Image Diagnosis via Conformal Focal Loss
Medical image diagnosis inherently involves uncertainty due to artifacts, occlusions, and ambiguous visual patterns, often leading to high inter-observer variability. While deep neural networks offer strong predictive performance, their outputs tend to be overconfident and poorly calibrated, limiting their clinical reliability. We propose Conformal Focal Loss (CFL), a principled approach that leverages the focal loss and the statistical validity of conformal prediction to better characterize diagnostic uncertainty. By emphasizing hard or ambiguous examples, CFL enables more accurate estimation of both predictive confidence and ambiguity. We evaluate CFL on diagnostic tasks using both clean and noise-augmented datasets, demonstrating its ability to effectively identify uncertain cases while maintaining robust classification performance under label noise.
Advanced News Event Clustering via Topic Enhanced Modeling with Multi-Aspect Contrastive Learning
News event clustering, a crucial task for discovering and comprehending real-world information, aims to aggregate news articles into fine-grained clusters based on specific key events. As the presence of topic-unrelated documents within clusters and redundant information within individual documents, it is challenging to learn a discriminative document representation. To address this issue, we introduce a novel method, TECL (Topic Enhanced modeling with Contrastive Learning), that leverages topic enhanced modeling with multi-aspect contrastive learning for news event clustering. The topic-enhanced modeling employs neural topic models to incorporate global and local semantics into document representations, while contrastive learning refines these representations from both inter-document and intra-document perspectives.Experiments conducted in both unsupervised and supervised scenarios indicate that the proposed method significantly improves performance, demonstrating its effectiveness.
HCLeK: Hierarchical Compression of Legal Knowledge for Retrieval-Augmented Generation
Prompt compression for Retrieval-Augmented Generation (RAG) often fails by treating all retrieved information uniformly. This undifferentiated approach neglects the critical distinction between foundational core knowledge and illustrative practical knowledge, a failure especially damaging in hierarchical domains like law where essential principles can be discarded for redundant details, diminishing information gain. To address this, we propose HCLeK, a Hierarchical Compression framework for Legal Knowledge. HCLeK uniquely leverages high-density core knowledge to guide the hierarchical compression of voluminous practical knowledge. The framework operates in three stages: (1) Core-Knowledge Guided Reranking to prioritize practical knowledge based on its semantic relevance to core legal principles; (2) Priority-Decay Budget Allocation to dynamically assign compression budgets, focusing on the most salient information; and (3) Relevance-Diversity Aware Semantic Compression for fine-grained sentence-level compression. Experimental results on the complex task of Legal Judgment Prediction (LJP) validate that HCLeK achieves state-of-the-art performance across various high compression ratios (0.5--0.05), demonstrating its effectiveness and robustness. Our code is available at https://github.com/fupanY/HCLeK.
USE-LTV: Customer LifeTime Value Prediction via Uncertain Sequence Modeling in Baidu Ads
Accurate customer LifeTime Value (LTV) predictions are of critical importance for evaluating the efficiency of customer management strategies, which could enhance advertising placement for better decision-making in ad systems. However, existing solutions for LTV prediction usually rely on determined historical sequence data, which are challenging to apply in Baidu ads due to two unique features including (i) uncertain behavior sequence of customers caused by dynamic advertising strategies, (ii) complex and long-tail distribution of LTV caused by continuous customer behaviors and unique business pattern of search and news feed ads in Baidu. To incorporate these new factors, we propose an Uncertain behavior Sequence modeling framework to predict customer LifeTime Value (USE-LTV), where we (i) utilize a transformer module to extract uncertain sequence features, and develop a dynamic weight mechanism to capture differentiated information under uncertain behavior sequence, (ii) design an continuous loss function tailored to the real-world long-tail exponential LTV distribution in Baidu ads. We extensively evaluate our method based on the industrial-scale real-world data from Baidu, one of the world's largest ads platform, demonstrate that USE-LTV achieves 11.64% of NMAE improvement for a-year LTV compared to the state-of-the-art method.
RepMedGAN: Self-supervised Representation-guided Medical GAN for Label-free Medical Image Synthesis
Medical image synthesis addresses healthcare data scarcity by generating realistic samples for clinical support systems, AI training, and research. However, the field faces challenges due to the complexity of imaging data with its diverse modalities, characteristics, and disease variations. To produce high-quality images, medical image synthesis typically relies on conditional generation, where labels and annotations serve as essential conditions that provide critical guidance signals during the generation process to control desired semantics and fidelity. However, in the medical domain, labels are often inaccessible due to the high cost of annotation, requirements for clinical expertise, as well as ethical concerns. To address this critical challenge, we propose RepMedGAN, a novel self-supervised representation-guided image generation framework that enhances label-free medical image synthesis by leveraging self-supervised learning representations, enabling high-quality generation across different modalities without requiring labels or annotations. Our framework incorporates a Self-supervised Guidance Module that provides rich semantic knowledge during training and introduces a Guidance Representation Generator to bridge the train-inference disparity. Through extensive evaluation across four diverse medical datasets including brain MRI, chest X-ray, kidney CT, and eye glaucoma images, we demonstrate that RepMedGAN consistently achieves state-of-the-art results across multiple metrics and produces superior-quality medical images.
Rethinking Masked Image Modeling for Ultrasound Image Denoising
Ultrasound imaging serves as an important clinical diagnostic modality due to its non-invasive, radiation-free, and real-time capabilities. However, ultrasound images suffer from speckle noise that significantly compromises diagnostic accuracy and clinical interpretation. Traditional denoising methods are limited by speckle noise's signal-dependent nature, often removing important diagnostic features. While deep learning performs better, it requires large labelled datasets that are difficult to obtain due to privacy concerns and annotation costs. Self-supervised learning through masked image modeling (MIM) shows potential in addressing data scarcity, but conventional MIM, developed for high-level vision tasks, is unsuitable for low-level tasks like image denoising due to its framework architecture and learning strategy. To this end, we propose Image Denoising Masked Image Modeling (ID-MIM), the first MIM framework for ultrasound image denoising. ID-MIM incorporates a novel high-frequency oriented dual-branch masking and a specialized learning objective for noise reduction. Our encoder-only architecture features a multi-scale hierarchical transformer with dynamic skip connections, where the encoder directly performs denoising rather than relying on separate decoder reconstruction as in conventional MIM approaches. Extensive experiments demonstrate the superior performance of our ID-MIM framework across diverse noise scenarios, establishing new state-of-the-art results.
Can LLMs Really Help Query Understanding In Web Search? A Practical Perspective
As a core module of web search, query understanding aims to bridge the semantic gap between user queries and web page documents, thereby enhancing the ability to deliver more relevant results. Recently, Large Language Models (LLMs) have achieved significant breakthroughs that have fundamentally altered the workflow of existing search ranking tasks. However, few researchers have explored the integration of LLMs into the field of query understanding. In this paper, we investigate the potential of LLMs in query understanding by conducting a comprehensive evaluation across three dimensions: term, structure, and topic. This evaluation includes several representative tasks such as segmentation, term weighting, error correction, query expansion, and intent recognition. The experimental results reveal that LLMs are particularly effective in query expansion and intent recognition but show limited improvement in other areas. This limitation may be attributed to LLMs' primary focus on modeling the semantic knowledge of entire queries, while lacking the capability to capture token-level information with finer granularity. Additionally, we explore potential practical applications of LLMs in query understanding, such as integrating the evaluation and training capabilities of smaller models with LLMs and constructing unsupervised samples. Based on comprehensive empirical results, collaborative training emerges as a promising approach to leverage LLMs for query understanding. We hope this research will advance the practical application of LLMs in query understanding and contribute to the development of this field.
AR2: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
Abstraction--the ability to recognize and distill essential computational patterns from complex problem statements--is a foundational skill in computer science, critical both for human problem-solvers and coding-oriented large language models (LLMs). Despite recent advances in training LLMs for code generation using reinforcement learning (RL), most existing approaches focus primarily on superficial pattern recognition, overlooking explicit training for abstraction. In this study, we propose AR2 (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly designed to enhance the abstraction abilities of LLMs. AR2 employs a teacher model to transform kernel problems into narrative-rich, challenging descriptions without changing their fundamental logic. Simultaneously, a student coding model is trained to solve these complex narrative problems by extracting their underlying computational kernels. Experimental results demonstrate that AR2 substantially improves the student model's accuracy on previously unseen, challenging programming tasks, underscoring abstraction as a key skill for enhancing LLM generalization.
MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization
Recent advances in multimodal learning have largely relied on pairwise contrastive objectives to align different modalities, such as text, video, and audio, in a shared embedding space. While effective in bi-modal setups, these approaches struggle to generalize across multiple modalities and often lack semantic structure in high-dimensional spaces. In this paper, we propose MOVER, a novel framework that combines optimal transport-based soft alignment with volume-based geometric regularization to build semantically aligned and structured multimodal representations. By integrating a transport-guided matching mechanism with a geometric volume minimization objective (GAVE), MOVER encourages consistent alignment across all modalities in a modality-agnostic manner. Experiments on text-video-audio retrieval tasks demonstrate that MOVER significantly outperforms prior state-of-the-art methods in both zero-shot and finetuned settings. Additional analysis shows improved generalization to unseen modality combinations and stronger structural consistency in the learned embedding space.
LLM-based Interactive Coding Education via Predictive Query Management and Student-Centered Fine-Tuning: Design and Implementation with 1500-Student Class Data
Large-scale university courses face significant challenges. Teaching assistants are overwhelmed by the large number of student questions, limiting their ability to provide detailed and individualized support. As a result, students-especially those who are struggling-receive less tailored assistance, further widening gaps in academic performance. To address these challenges, we propose student-centered AI learning assistant (SCALA), a large language model (LLM)-based interactive tutoring system that incorporates student needs and learning expectations. SCALA consists of two main components, i.e., predictive query management and student-centered fine tuning. The first component anticipates common student questions via LLM agent debate. Each agent interacts using a combination of lecture content and student interactions in chat logs, collaboratively predicting what the students will likely ask. This fosters learning in students by generating and presenting relevant queries that guide their learning. On the other hand, the second component is fine-tuned on a 14k-question Python-tutoring dataset, curated based on in-depth student interviews to reflect real learning expectations. Our real-world experiments with 1500-student large-scale Python classes demonstrate that SCALA delivers more helpful and accurate responses compared to closed-form models (e.g., GPT-4o), while significantly reducing latency.
CPSRank: Unsupervised Keyphrase Extraction via Contextual Perturbation
The importance of a phrase within a document becomes most evident through its absence rather than its presence. Inspired by this observation, we redefine keyphrases as those whose removal most disrupts the document's meaning. Traditional unsupervised methods typically rely on document-level signals, such as term frequency or phrase-to document similarity, which overlook the contextual contribution of a phrase. This paper proposes CPSRank, an unsupervised keyphrase extraction method that evaluates the semantic importance of candidate phrases via a contextual perturbation score (CPS). The CPS quantifies the critical role of each phrase by combining contextual perturbation and content loss. CPSRank outperforms existing baselines in terms of F1 scores while providing deeper insights into the semantic value of keyphrases. We release our code at https://github.com/Splo2t/CPSRank.
Query, Decompose, Compress: Structured Query Expansion for Efficient Multi-Hop Retrieval
Large Language Models (LLMs) have been increasingly employed for query expansion. However, their generative nature often undermines performance on complex multi-hop retrieval tasks by introducing irrelevant or noisy information. To address this challenge, we propose DeCoR (Decompose and Compress for Retrieval), a framework grounded in structured information refinement. Rather than generating additional content, DeCoR strategically restructures the query's underlying reasoning process and distills supporting evidence from retrieved documents. It consists of two core components tailored to the challenges of multi-hop retrieval: (1) Query Decomposition, which decomposes a complex query into explicit reasoning steps, and (2) Query-aware Document Compression, which synthesizes dispersed evidence from candidate documents into a concise summary relevant to the query. This structured design ensures that the final query representation remains both robust and comprehensive. Experimental results demonstrate that, despite utilizing a relatively small LLM, DeCoR outperforms strong baselines that rely on larger models. This finding underscores that, in complex retrieval scenarios, sophisticatedly leveraging the reasoning and summarization capabilities of LLMs offers a more efficient and effective solution than relying solely on their generative capability.
Robust Handwritten Text Recognition via Multi-Source Adversarial Domain Adaptation for Low-Resource Scripts
Transformer-based Optical Character Recognition (OCR) models perform well on common tasks but struggle to generalize to low-resource settings, handwritten text, or diverse scripts such as Jawi. Traditional adaptation methods often require labeled target data or fall short under severe domain shifts. We tackle this challenge with Unsupervised Multi-Source Heterogeneous Domain Adaptation (UMSHDA) for Handwritten Text Recognition (HTR). We adapt Domain-Adversarial Neural Networks (DANN) and Prototype Learning for Adversarial Domain Adaptation (PLADA) within a Transformer-based OCR architecture. We introduce two novel multi-adversarial adaptation strategies: (1) an Additive strategy that jointly optimizes a domain classification loss and a prototype-based alignment loss, and (2) an Integrative strategy that uses a prototype-driven adversarial signal to augment the domain classifier with semantic constraints. Experimental results show that our methods significantly outperform non-adaptive and DANN baselines on challenging handwriting tasks. For Jawi handwriting, our models achieve a relative error reduction of up to 22%. We also demonstrate superior cross-script generalization to German handwriting, achieving near-perfect performance, with a CER reduced by more than 50% relative to the DANN baseline. This work demonstrates that tailored multi-adversarial domain adaptation can effectively bridge significant domain gaps, enabling robust recognition accuracy for complex, low-resource HTR.
Ultra Fast Warm Start Solution for Graph Recommendations
In this work, we present a fast and effective Linear approach for updating recommendations in a scalable graph-based recommender system UltraGCN. Solving this task is extremely important to maintain the relevance of the recommendations under the conditions of a large amount of new data and changing user preferences. To address this issue, we adapt the simple yet effective low-rank approximation approach to the graph-based model. Our method delivers instantaneous recommendations that are up to 30 times faster than conventional methods, with gains in recommendation quality, and demonstrates high scalability even on the large catalogue datasets.
Uncertainty Quantification for Multiple-Choice Questions is Just One-Token Deep
Multiple-choice question (MCQ) benchmarks such as MMLU and GPQA are widely used to assess the capabilities of large language models (LLMs). While accuracy remains the standard evaluation metric, recent work has introduced uncertainty quantification (UQ) methods, such as entropy, conformal prediction, and verbalized confidence, as complementary measures of model reliability and calibration. However, we find that these UQ methods, when applied to MCQ tasks, are unexpectedly fragile. Specifically, we show that fine-tuning a model on just 1,000 examples to adjust the probability of the first generated token, under the common prompting setup where the model is instructed to output only a single answer choice, can systematically distort a broad range of UQ methods across models, prompts, and domains, all while leaving answer accuracy unchanged. We validate this phenomenon through extensive experiments on five instruction-tuned LLMs, tested under standard prompting, zero-shot chain-of-thought reasoning, and a biomedical question answering setting. In all cases, models retain similar accuracy but exhibit significantly degraded calibration. These results suggest that current UQ practices for MCQs are ''one-token deep'', driven more by first-token decoding behavior than by any deeper representation of uncertainty, and are easily manipulated through minimal interventions. Our findings call for more robust and interpretable approaches to uncertainty estimation, particularly in structured formats like MCQs, where confidence signals are often reduced to token-level heuristics.
PMTA: Perception-Aware Multi-Task Transformer Network for Personalized Multi-Domain Adaptation
The escalating complexity of industrial recommendation systems, characterized by diverse user behaviors and cross-domain application scenarios, necessitates advanced multi-task and multi-domain learning paradigms. Existing methods often struggle with efficient knowledge transfer across tasks and domains due to semantic gaps and distribution shifts. To address these challenges, we propose the Perception-Aware Multi-Task Transformer Network for Personalized Multi-Domain Adaptation (PMTA), a unified framework that integrates three key innovations: First, the Task Prompt Encoding (TPE) module dynamically generates prompts by synthesizing personalized user data with task-specific information. Second, the Transformer-based Multi-Task Perception (TMPN) network enables adaptive cross-task knowledge transfer through attention mechanisms. Third, the Multi-Domain Adaptation (MDAN) component captures domain-specific behavior patterns via learnable prior information. Experimental results demonstrate PMTA's effectiveness, achieving 0.168% increase in watch time and significant improvements in engagement metrics (AAD: +0.0113%, AAH: +0.0608%). Deployed on Douyin and Douyin Lite, it significantly improves recommendation quality and drives commercial success.
PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization
Personalized retrieval-augmented generation (RAG) aims to produce user-tailored responses by incorporating retrieved user profiles alongside the input query. Existing methods primarily focus on improving retrieval and rely on large language models (LLMs) to implicitly integrate the retrieved context with the query. However, such models are often sensitive to retrieval quality and may generate responses that are misaligned with user preferences. To address this limitation, we propose PrLM, a reinforcement learning framework that trains LLMs to explicitly reason over retrieved user profiles. Guided by a contrastively trained personalization reward model, PrLM effectively learns from user responses without requiring annotated reasoning paths. Experiments on three personalized text generation datasets show that PrLM outperforms existing methods and remains robust across varying numbers of retrieved profiles and different retrievers.
Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework
Current recommendation methods typically use a unified framework to offer personalized recommendations for different scenarios provided by commercial platforms. However, they often employ shared bottom representations, which partially hinders the model's capacity to capture scenario uniqueness. Ideally, users and items should exhibit specific characteristics in different scenarios, prompting the need to learn scenario-specific representations to differentiate scenarios. Yet, variations in user and item interactions across scenarios lead to data sparsity issues, impeding the acquisition of scenario-specific representations. To learn robust scenario-specific representations, we introduce a Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework (GSVR) that can be directly applied to existing multi-scenario methods. Specifically, considering the uncertainty stemming from limited samples, our approach employs a probabilistic model to generate scenario-specific distributions for each user and item in each scenario, estimated through variational inference (VI). Additionally, we introduce the global knowledge-aware multinomial distributions as prior knowledge to regulate the learning of the posterior user and item distributions, ensuring similarities among distributions for users with akin interests and items with similar side information. This mitigates the risk of users or items with fewer records being overwhelmed in sparse scenarios. Extensive experimental results affirm the efficacy of GSVR in learning more robust representations.
SAKG: Structure-Aware Large Language Model Framework for Knowledge Graph Reasoning
Existing approaches to leveraging knowledge graphs in large language models often lack explicit structural modeling, which can lead to hallucinations and unstable reasoning over graph data. To address this, we propose SAKG, a structure-aware prompting framework designed to enhance the alignment between knowledge graph representations and language model inference. SAKG employs a hierarchical prompting strategy that integrates explicit task instructions, structural embeddings, and selectively filtered neighbor context. In particular, we introduce a progressive neighbor selection mechanism that combines relation co-occurrence statistics with embedding-based semantic similarity, ensuring that only informative and relevant neighbors are included in the prompt. This design enables the model to better capture relational semantics, structural dependencies, and contextual cues within the graph. Experimental results on multiple benchmarks demonstrate that SAKG consistently improves the effectiveness and factual consistency of knowledge graph reasoning with large language models.
Relation-Sensitive Visual Aggregation Enhances Multimodal Knowledge Graph Completion
Existing multimodal knowledge graph completion methods often overlook triple correlations between relations and images, limiting the expressiveness of multimodal embeddings. In this paper, we categorize triple correlations into Intra-triple Correlations (IaC) and Inter-triple Correlations (IeC), and propose a method called Relation-Sensitive Visual Aggregation (RSVA) to explicitly model them. Specifically, RSVA consists of two modules. The first is the Visual Semantic Aggregation Module, aggregating visual features for the central entity considering IaC. The second is the Contextual Neighbor Aggregation Module, capturing IeC by aggregating visual semantics from neighboring entities. In link prediction experiments, RSVA demonstrates the effect of IaC and IeC on the embeddings of central entities and achieves improved performance compared with previous approaches. These results demonstrate the effectiveness of RSVA, indicating that explicitly modeling the latent correlations between relations and images can enhance the representational capability of multimodal knowledge graphs.
Interpretable Meta-weighting Sparse Neural Additive Networks for Datasets with Label Noise and Class Imbalance
Black-box neural networks are inherently inscrutable, and their widespread use has triggered significant societal issues in crucial areas such as healthcare, finance and safety. In these high-stakes decision-making domains, the deployment of machine learning algorithms requires not only prediction accuracy but also their interpretability and robustness against data distribution shifts, such as outliers, label noise, and category imbalance. In this work, we propose a novel Meta-weighted Sparse Neural Additive Model (MSpNAM), which offers robustness through an efficient bilevel weighting policy and inherits strong explainability and representation capabilities from the additive modeling strategy. Furthermore, empirical results across multiple synthetic and real datasets, under various distribution shifts, demonstrate that MSpNAM can scale effectively and achieve superior performance in terms of robustness, interpretability, and anti-forgetting compared to some of the latest baselines.
An Robust Entity Alignment Method based on Knowledge Distillation with Noisy Aligned Pairs
Entity alignment (EA) aims to find the same entities in different knowledge graphs. Existing EA methods assume the supervised aligned pairs without noise. In applications, noisy pairs lead to degradation of EA performance. To this end, a robust EA method based on knowledge distillation is proposed for noisy pairs. Firstly, the dual-teacher model with online distillation is designed, in which, noise discriminator is performed to improve the noise resistance of teacher models. Secondly, a student model is offline distilled from the dual-teacher model without using the noisy supervised pairs, further enhancing the robustness of student model. In addition, the entity structure is combined with entity representation for alignment inference to alleviate the bias of entity representation in noisy environment. Extensive experiments demonstrate the effectiveness of the proposed method.
Student-Augmented Self-Training with Closed Loop Feedback in Linguistic Steganalysis
As a countermeasure to linguistic steganography, linguistic steganalysis aims to distinguish between texts containing hidden secret messages (stego) and natural texts (cover). Current semi-supervised steganalysis approaches rely on a self-training mechanism. However, in linguistic steganalysis, pseudo-label errors propagate through iterative training, causing the student model to reinforce incorrect stego-distribution associations, thereby impairing its discriminative ability for subtle linguistic perturbations. To address this challenge, we propose SALT-LS, a self-training linguistic steganalysis framework that integrates a closed feedback loop between student and teacher models alongside a dual-constraint mechanism to improve pseudo-labels. Unlike a conventional semi-supervised steganalysis approach, we compute prototype penalties from both same-class (rather than intra-class) and cross-class perspectives, enabling more effective use of labeled data. Furthermore, we introduce an advanced-updating strategy for the student model, which is combined with the dual-constraint mechanism, forming a closed feedback loop that continuously refines the teacher model's pseudo-label generation for robust steganalysis performance. Extensive experiments on six datasets, including widely used steganographic strategies and corpora, demonstrate that SALT-LS outperforms state-of-the-art models. Our code is available.
Non-autoregressive Generative Auction with Global Externalities for Online Advertising
Online advertising auctions play a critical role in internet commerce, requiring mechanisms that maximize revenue while ensuring incentive compatibility, user experiences, and real-time efficiency. Existing learning-based auction frameworks advance contextual modeling by considering intra-list dependencies among ads, but still face challenges of insufficient global externality modeling and inefficiencies due to sequential processing. In this paper, we propose the Non-autoregressive Generative Auction with global externalities (NGA), a novel end-to-end auction framework for industrial online advertising. NGA explicitly models global externalities by jointly encoding dependencies among ads and the influence of neighboring organic content. To achieve real-time efficiency, NGA employs a non-autoregressive, constraint-based decoding mechanism and a parallel multi-tower evaluator that unifies list-wise reward and payment computation. Extensive offline experiments and large-scale online A/B tests on commercial advertising platforms demonstrate that NGA achieves superior performance in both effectiveness and efficiency compared to the state-of-the-art baselines.
MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation
Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion probabilistic models (DDPMs), suffer from high inference latency and variable outputs, limiting their applicability in real-world tabular settings. To address these deficiencies, we present in this paper MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation. While stochastic sampling enables diverse completions, it also introduces output variability that complicates downstream processing. MissDDIM replaces this with a deterministic, non-Markovian sampling path, yielding faster and more consistent imputations. To better leverage incomplete inputs during training, we introduce a self-masking strategy that dynamically constructs imputation targets from observed features-enabling robust conditioning without requiring fully observed data. Experiments on five benchmark datasets demonstrate that MissDDIM matches or exceeds the accuracy of state-of-the-art diffusion models, while significantly improving inference speed and stability. These results highlight the practical value of deterministic diffusion for real-world imputation tasks.
Adversarially Attacking Graph Properties and Sparsification in Graph Learning
Graph neural networks and graph transformers explicitly or implicitly rely on fundamental properties of the underlying graph, such as spectral properties and shortest-path distances. However, it is still not clear how these graph properties are vulnerable to adversarial attacks and what impacts this has on the downstream graph learning. Moreover, while graph sparsification has been used to improve computational cost of learning over graphs, its susceptibility to adversarial attacks has not been studied. In this paper, we study adversarial attacks on graph properties and graph sparsification and their impacts on downstream graph learning, paving the way for how to protect against these potential attacks. Our proposed methods are effective in attacking spectral properties, shortest distances, and graph sparsification as demonstrated in our experimental evaluation.
Asymmetric Diffusion Recommendation Model
Recently, motivated by the outstanding achievements of diffusion models, the diffusion process has been employed to strengthen representation learning in recommendation systems. Most diffusion-based recommendation models typically utilize standard Gaussian noise in symmetric forward and reverse processes in continuous data space. Nevertheless, the samples derived from recommendation systems inhabit a discrete data space, which is fundamentally different from the continuous one. Moreover, Gaussian noise has the potential to corrupt personalized information within latent representations. In this work, we propose a novel and effective method, named Asymmetric Diffusion Recommendation Model (AsymDiffRec), which learns forward and reverse processes in an asymmetric manner. We define a generalized forward process that simulates the missing features in real-world recommendation samples. The reverse process is then performed in an asymmetric latent feature space. To preserve personalized information within the latent representation, a task-oriented optimization strategy is introduced. In the serving stage, the raw sample with missing features is regarded as a noisy input to generate a denoising and robust representation for the final prediction. By equipping base models with AsymDiffRec, we conduct online A/B tests, achieving improvements of +0.131% and +0.166% in terms of users' active days and app usage duration respectively. Additionally, the extended offline experiments also demonstrate improvements. AsymDiffRec has been implemented in the Douyin Music App.
Active Recommendation for Email Outreach Dynamics
Email outreach remains a cornerstone of modern marketing, enabling direct, timely communication. However, this strategy faces significant personalization challenges, since new campaigns typically lack historical interaction data and rich side information. In this work, we propose a framework that combines collaborative-filtering (CF) signals derived from a shallow autoencoder (SAE) with a Thompson Sampling-based multi-armed bandit to dynamically select small batches of recipients for each email template. We show SAEs help balance exploration and exploitation by quantifying recipient informativeness and confidence, enabling efficient personalization without retraining during active learning. To facilitate reproducibility and future research, we release a large dataset of almost 15 million recipient-message interactions, offering new insights into email outreach dynamics for CF. Our experiments show that our method outperforms multiple baselines in retrieval metrics while retaining interpretable model components.
Structuring Video Semantics with Temporal Triplets for Zero-Shot Video Question Answering
Current large vision-language models (VLMs) exhibit remarkable performance in basic video understanding tasks. However, existing VLMs are still limited to surface-level perception and lack fine-grained spatio-temporal understanding and combinatorial reasoning capabilities. Existing methods typically rely on expensive human annotations or subtitle extraction, yet they struggle to effectively model temporal relations between frames. This paper proposes a structured representation based on temporal triplets to address two major challenges in traditional approaches: temporal fragmentation and entity reference ambiguity. By modeling objects, attributes, and relationships within the video and incorporating temporal information, we convert semantic content from keyframes into a sequence of temporal triplets. This structured representation is then used as input for zero-shot video question answering (VideoQA). Experiments were conducted on four benchmark VideoQA datasets: NExT-QA, STAR, MSVD-QA, and MSRVTT-QA, showing that our method achieves competitive performance without requiring fine-tuning, validating its generality and effectiveness.
SESSION: Applied Research Papers
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs
Short and ambiguous queries in chat interfaces like Microsoft Copilot often lead to vague or irrelevant LLM responses, increasing task completion time. Hence, we introduce a novel problem: semi-automatically enhancing such queries/prompts into specific, well-formed ones with clear intent. Unlike prompt optimization, our approach adds relevant sub-intents or constraints rather than just rewording for brevity. We propose EnhanceMyPrompt, which uses small language models (SLMs) to enrich prompts by adding sub-intents/constraints, suggesting placeholders, and recommending popular values. We also introduce metrics to measure prompt improvement, user effort, and LLM response quality. Experiments on a proprietary Microsoft Copilot and LMSYS+NQ datasets with four SLMs show effectiveness: EnhanceMyPrompt predicts user intents up to 3 turns ahead in ~23% of conversations, enabling efficient sessions. Code, prompts, data, and models for LMSYS+NQ are publicly available.
FR-LoRA: Fisher Regularized LoRA for Multilingual Continual Learning
Relevance in e-commerce product search is critical to ensuring that results accurately reflect customer intent. While large language models (LLMs) have recently advanced natural language processing capabilities, their high inference latency and significant infrastructure demands make them less suitable for real-time e-commerce applications. Consequently, transformer-based encoder models are widely adopted for relevance classification tasks. These models typically evaluate the relevance of a product to a given query by encoding the query and product title as input features. As e-commerce stores expand into new marketplaces, the need for language- and region-specific relevance models grows, often resulting in the sequential development and maintenance of separate models per marketplace. To address this challenge, we introduce a multilingual continual learning (CL) framework that mitigates catastrophic forgetting. Our proposed method, FR-LoRA (Fisher Regularized LoRA), integrates Elastic Weight Consolidation (EWC) with marketplace-specific LoRA modules, where each LoRA is regularized using the Fisher information matrix. FR-LoRA retains the same inference-time footprint as the base model, ensuring zero additional latency while enabling frequent, scalable updates. Empirically, our approach achieves a ~3% ROC-AUC improvement over single-marketplace baselines and outperforms several recent CL baselines on both proprietary and public datasets.
Uncovering Corporate Influence: A First Scalable Method for Qualifying Holdings Computation
Distributed logic-based reasoning has recently emerged as a powerful, explainable, and auditable approach for solving complex problems across a wide range of domains. Based on our experience in building company ownership graphs at the Bank of Italy, this paper presents a distributed reasoning solution to the qualifying holding problem. Qualifying holdings measure an entity's influence over a bank or financial intermediary and serve as a key metric in banking supervision. Their calculation is particularly challenging due to the large scale of real-world ownership networks and their structural complexity---the latter often not fully addressed by existing regulatory frameworks. To our knowledge, no standardised, efficient computational approach that also faithfully reflects regulatory interpretations has ever been proposed, leaving a significant gap between legal requirements and practical capabilities. We fill this gap by proposing the first mathematical formalisation of the qualifying holding problem, proving it to be inherently hard (#P-complete). Despite its general intractability, we develop a logic-based reasoning algorithm based on Datalog± that enables efficient and parallel computation of qualifying holdings across real-world scenarios. Extensive experiments confirm its practical effectiveness and scalability.
Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization
To provide promising recommendation results, there exist three major stages in the industrial RecSys chain to support our service: (1) The first Retrieval model aims at searching hundreds of item candidates. (2) Next, the Ranking model estimates the multiple aspect probabilities Pxtrs for each retrieved item. (3) At last, the Ensemble Sort stage merges those Pxtrs into one comparable score, and then selects the best dozen items with the highest scores to recommend them. To our knowledge, the wide-accepted industry ensemble sort approach still relies on manual formula-based adjustment, i.e., assigning manual weights for Pxtrs to control its influence on fusion score. Under this framework, the RecSys severely relies on expert knowledge to determine satisfactory weight for each Pxtr, which blocks RecSys's further advancements. In this paper, we provide Pantheon, a practical neural-network based ensemble sort. Compared with formulation-based ensemble sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input, which could benefit from the Ranking model to enhance our model complexity. Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an iterative Pareto policy optimization (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge, this paper is the first work to replace the entire formulation-based ensemble sort in industry RecSys, which was fully deployed at Kuaishou live-streaming services, serving 400 Million users daily.
T-Stars-Poster: A Framework for Product-Centric Advertising Image Design
Creating advertising images is often a labor-intensive and time-consuming process. Can we automatically generate such images using basic product information like a product foreground image, taglines, and a target size? Existing methods mainly focus on parts of the problem and lack a comprehensive solution. To bridge this gap, we propose a novel product-centric framework for advertising image design called T-Stars-Poster. It consists of four sequential stages to highlight product foregrounds and taglines while achieving overall image aesthetics: prompt generation, layout generation, background image generation, and graphics rendering. Different expert models are designed and trained for the first three stages: First, a visual language model (VLM) generates background prompts that match the products. Next, a VLM-based layout generation model arranges the placement of product foregrounds, graphic elements (taglines and decorative underlays), and various nongraphic elements (objects from the background prompt). Following this, an SDXL-based model can simultaneously accept prompts, layouts, and foreground controls to generate images. To support T-Stars-Poster, we create two corresponding datasets with over 50,000 labeled images. Extensive experiments and online A/B tests demonstrate that T-Stars-Poster can produce more visually appealing advertising images.
Let Topology Speak: Graph Neural Network with Topology-Aware Augmentation
Company financial risk is widespread, and accurate prediction is critical to avoiding significant losses. Many risky companies often exhibit subtle anomalies, incomplete information, or limited interactions with others; however, the types of their interactions remains diverse and informative. Existing methods like metapath-based Graph Neural Networks effectively leverage node relationships but are constrained by manual biases, noise introduction and high computational complexity. Similarly, Graph Transformers show strong performance but suffer from prohibitively high complexity. To overcome these challenges, we propose the Graph Neural Network with Topology-Aware Augmentation (GTA). GTA adopts a dual augmentation strategy based on topology information, augmenting both topology and attributes. It first performs unification encoding on single node-type heterogeneous graphs, integrating heterogeneous topology into node representations. Expressive topology encoding is performed, followed by dual augmentation based on the learned topology embeddings. Through this approach, GTA achieves effective risk prediction. Extensive experiments on the real-world dataset demonstrate GTA's superior performance compared to state-of-the-art metapath-based and graph transformer-based methods, effectively handling sparse graphs with a single node-type and multiple edge-types. Comprehensive ablation study and visual analysis further validate the discriminative power of topology augmentation in distinguishing risky companies. And our code is publicly available at https://github.com/ckz123/GTA.
See Beyond a Single View: Multi-Attribution Learning Leads to Better Conversion Rate Prediction
Conversion rate (CVR) prediction is a core component of online advertising systems, where the attribution mechanisms-rules for allocating conversion credit across user touchpoints-fundamentally determine label generation and model optimization. While many industrial platforms support diverse attribution mechanisms (e.g., First-Click, Last-Click, Linear, and Data-Driven Multi-Touch Attribution), conventional approaches restrict model training to labels from a single production-critical attribution mechanism, discarding complementary signals in alternative attribution perspectives. To address this limitation, we propose a novel Multi-Attribution Learning (MAL) framework for CVR prediction that integrates signals from multiple attribution perspectives to better capture the underlying patterns driving user conversions. Specifically, MAL is a joint learning framework consisting of two core components: the Attribution Knowledge Aggregator (AKA) and the Primary Target Predictor (PTP). AKA is implemented as a multi-task learner that integrates knowledge extracted from diverse attribution labels. PTP, in contrast, focuses on the task of generating well-calibrated conversion probabilities that align with the system-optimized attribution metric (e.g., CVR under the Last-Click attribution), ensuring direct compatibility with industrial deployment requirements. Additionally, we propose CAT, a novel training strategy that leverages the Cartesian product of all attribution label combinations to generate enriched supervision signals. This design substantially enhances the performance of the attribution knowledge aggregator. Empirical evaluations demonstrate the superiority of MAL over single-attribution learning baselines, achieving +0.51% GAUC improvement on offline metrics. Online experiments demonstrate that MAL achieved a +2.6% increase in ROI (Return on Investment).
Personalized Tree-Based Progressive Regression Model for Watch-Time Prediction in Short Video Recommendation
In online video platforms, accurate watch time prediction has become a fundamental and challenging problem in video recommendation. Previous research has revealed that the accuracy of watch time prediction highly depends on both the transformation of watch-time labels and the decomposition of the estimation process. TPM (Tree based Progressive Regression Model) achieves State-of-the-Art performance with a carefully designed and effective decomposition paradigm. TPM discretizes the watch time into several ordinal intervals and organizes them into a binary decision tree, where each node corresponds to a specific interval. At each non-leaf node, a binary classifier is used to determine the specific interval in which the watch time variable most likely falls, based on the prediction outcome at its parent node. The tree structure is central to TPM, as it defines the decomposition of watch time estimation and how ordinal intervals are discretized. However, TPM uses a predefined full binary tree, which may be sub-optimal for two reasons. First, full binary trees imply equal partitioning of the watch time space, which may fail to capture the complexity of real-world distributions. Second, rather than relying on a fixed global structure, we advocate for a personalized, data-driven tree that can be learned end-to-end. Thus, we propose PTPM to enable highly personalized decomposition of watch estimation with better efficacy and efficiency. Moreover, we show that TPM suffers from selection bias due to conditional modeling and propose a simple solution. We conduct extensive experiments on offline datasets and online environments. Offline results show improved watch time accuracy, and online A/B tests further validate the effectiveness of our framework. PTPM has been fully deployed in core traffic scenarios and now serves over 400 million users daily.
Neighbor-enhanced Graph Pre-training and Prompt Learning Framework for Fraud Detection
Nowadays, as more users turn to WeChat Pay and other e-commerce platforms for transactions, an increasing number of fraudsters are being attracted to these platforms to conduct fraudulent activities, thereby stealing money. To address this issue, Graph Neural Networks (GNNs) have been widely adopted and have shown great success. However, with the rise of various transaction methods, users are increasingly engaging in multiple transaction networks, which creates a new scenario that requires models to detect fraud across these diverse networks. Unfortunately, current GNN-based fraud detection strategies often exhibit suboptimal performance and high time complexity in this evolving scenario, as they typically can handle only one transaction network at a time. Recently, advancements in graph prompt learning have demonstrated great success in managing various types of graph data and improving the generalization capabilities of the model, showing great promise for addressing this new fraud detection scenario. Nevertheless, the practical application of graph prompt learning in real-world fraud detection is still constrained, as they may exhibit bias when dealing with multiplex transaction networks and may fail to model the intrinsic relationships between nodes and their neighbors, which is crucial for effective fraud detection. To address these two challenges, we propose GPCF, an efficient graph pre-training and prompt learning framework. GPCF first incorporates a meta-learning-based strategy within neighbor-enhanced contrastive learning to pre-train the GNN model across diverse transaction networks. Then it aligns fraud detection tasks with the well-pre-trained model by simply fine-tuning the prompts. Extensive experiments demonstrate that GPCF achieves state-of-the-art results on open-access fraud and transaction datasets, as well as on real-world fraud datasets from WeChat Pay, one of the largest e-commerce platforms globally, showing the effectiveness of GPCF in practical applications.
Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents
Proprietary corporate documents contain rich domain-specific knowledge, but their overwhelming volume and disorganized structure make it difficult even for employees to access the right information when needed. For example, in the automotive industry, vehicle crash-collision tests-each costing hundreds of thousands of dollars-produce highly detailed documentation. However, retrieving relevant content during decision-making remains time-consuming due to the scale and complexity of the material. While Retrieval-Augmented Generation (RAG)-based Question Answering (QA) systems offer a promising solution, building an internal RAG-QA system poses several challenges: (1) handling heterogeneous multi-modal data sources, (2) preserving data confidentiality, and (3) enabling traceability between each piece of information in the generated answer and its original source document. To address these, we propose a RAG-QA framework for internal enterprise use, consisting of: (1) a data pipeline that converts raw multi-modal documents into a structured corpus and QA pairs, (2) a fully on-premise, privacy-preserving architecture, and (3) a lightweight reference matcher that links answer segments to supporting content. Applied to the automotive domain, our system improves factual correctness (+1.79, +1.94), informativeness (+1.33, +1.16), and helpfulness (+1.08, +1.67) over a non-RAG baseline, based on 1-5 scale ratings from both human and LLM judge. The system was deployed internally for pilot testing and received positive feedback from employees.
When Words Can't Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset
While there exists a lot of work on explainable complaint mining, articulating user concerns through text or video remains a significant challenge, often leaving issues unresolved. Users frequently struggle to express their complaints clearly in text but can easily upload videos depicting product defects (e.g., vague text such as 'worst product' paired with a 5-second video depicting a broken headphone with the right earcup). This paper formulates a new task in the field of complaint mining to aid the common users' need to write an expressive complaint, which is Complaint Description from Videos (CoD-V) (e.g., to help the above user articulate her complaint about the defective right earcup). To this end, we introduce ComVID, a video complaint dataset containing 1,175 complaint videos and the corresponding descriptions, also annotated with the emotional state of the complainer. Additionally, we present a new complaint retention (CR) evaluation metric that discriminates proposed (CoD-V) task against standard video summary generation and description task. To strengthen this initiative, we introduce a multimodal Retrieval-Augmented Generation (RAG) embedded VideoLLaMA2-7b model, designed to generate complaints while accounting for the user's emotional state. We conduct a comprehensive evaluation of several Video Language Models on several tasks (pre-trained and fine-tuned versions) with a range of established evaluation metrics, including METEOR, perplexity, and the Coleman-Liau readability score, among others. Our study lays the foundation for a new research direction to provide a platform for users to express complaints through video. Dataset and resources are available at: https://github.com/sarmistha-D/CoD-V.
RottenReviews: Benchmarking Review Quality with Human and LLM-Based Judgments
The quality of peer review plays a critical role in scientific publishing, yet remains poorly understood and challenging to evaluate at scale. In this work, we introduce RottenReviews, a benchmark designed to facilitate systematic assessment of review quality. RottenReviews comprises over 15,000 submissions from four distinct academic venues enriched with over 9,000 reviewer scholarly profiles and paper metadata. We define and compute a diverse set of quantifiable review-dependent and reviewer-dependent metrics, and compare them against structured assessments from large language models (LLMs) and expert human annotations. Our human-annotated subset includes over 700 paper-review pairs labeled across 13 explainable and conceptual dimensions of review quality. Our empirical findings reveal that LLMs, both zero-shot and fine-tuned, exhibit limited alignment with human expert evaluations of peer review quality. Surprisingly, simple interpretable models trained on quantifiable features outperform fine-tuned LLMs in predicting overall review quality. We publicly release all data, code, and models at https://github.com/Reviewerly-Inc/RottenReviews to support further research in this area.
Building a Virtual Member of a Community of Practice
We describe a virtual member of a knowledge management Community of Practice (CoP), called ATHENA, that knows an individual, his tasks, his organization, and the community. ATHENA employs an agentic chat capability that combines embeddings with knowledge-based faceted search to provide accurate responses to technical questions along with rationale and citations for efficient validation. ATHENA supports natural, in-the-flow capture of task-related insights to share within a CoP, along with proactive dissemination of information tied to an individual and his current needs. An evaluation involving 75 professionals from the Oil & Gas sector shows that ATHENA dramatically improved outcomes and productivity on a set of well-planning tasks compared to their use of a state-of-the-art RAG baseline. Interestingly, ATHENA also enabled eight non-experts to perform at expert levels.
NeighSqueeze: Compact Neighborhood Grouping for Efficient Billion-Scale Heterogeneous Graph Learning
The rapid growth of online shopping has intensified competition among logistics companies, highlighting the importance of customer expansion, i.e., identifying customers willing to establish long-term contracts. Although existing approaches frame customer expansion as a node classification task using heterogeneous graph learning to capture complex interactions between a customer and other items, it is computationally infeasible to utilize all neighboring interactions on large-scale logistics graphs. Current sub-sampling methods reduce computational load by sampling a small part of neighborhood for training. However, they introduce substantial information loss, particularly affecting high-degree nodes and decreasing predictive accuracy. To address this, we introduce NeighSqueeze, a novel approach that groups structurally and semantically similar nodes, substantially reducing the neighbors count and facilitating full-neighbor learning. NeighSqueeze consists of three modules designed to efficiently and effectively enable node grouping on billion-scale heterogeneous graphs: (1) Structure-tightness-based neighbor filtering reduces the high redundancy and complexity in similarity computations. (2) Hybrid similarity graph construction addresses the difficulty of measuring node similarity at scale; and (3) A two-level grouping strategy resolves the label dominance issue within groups. We evaluate NeighSqueeze on JD Logistics, one of the largest logistics companies in China. Compared with sub-sampling methods, our NeighSqueeze exhibits lower runtime and memory usage with full-neighbor training on the compressed graph, while simultaneously improving average precision over 28.9% in offline evaluation and increase new customer exploration rate by 18.6% in online A/B testing.
Converted Data is All You Need for Causal Optimization of e-Commerce Promotions
Promotional campaigns are essential drivers of customer engagement and revenue in e-commerce. Maintaining these campaigns within budget constraints requires targeted allocation, traditionally achieved through causal uplift models that rely on vast datasets of user interactions, including non-converted sessions, which introduce challenges such as noisy data, attribution complexity and imbalanced outcomes. We propose a novel approach using converted-only data, which reduces training data size, simplifies attribution, improves efficiency, and mitigates the impact of non-converted interactions. We present a generalized framework for budget constrained promotion allocation with converted-only data and validate it through a benchmarking study and multiple large-scale deployments at Booking.com, positively impacting the experience of millions of customers worldwide. Our results demonstrate that the proposed method is competitive with standard modeling approaches and, in some cases, significantly outperforms them.
SSH-T3 : A Hierarchical Pre-training Framework for Multi-Scenario Financial Risk Assessment
Efficiently modeling user behavior on online payment platforms is crucial for accurately identifying potential financial risks. With the rapid growth of online payment platforms, the volume of user transaction data has significantly increased. Moreover, users' payment behaviors often encompass diverse activities and interactions across multiple scenarios. Based on observations from online payment platforms, we identify three key challenges: scarce labels and poor representation robustness, long user payment behavior sequences, and complex and heterogeneous amount-aware scenarios. To address these challenges, we propose a novel Self-Supervised Hierarchical Two-Tower Transformer SSH-T3, specifically designed for multi-scenario financial risk assessments. We introduce a masked modeling pre-training task to reconstruct multi-scenario day-level transaction amount distributions, effectively mitigating behavior-level noise and enhancing representation robustness. Additionally, we propose a hierarchical Multi-Scenario Payment Behavior Sequence (MS-PBS) modeling approach tailored to business needs, which significantly reduces complexity while capturing user behavior patterns more effectively through day-level representations. Furthermore, we highlight the critical importance of correlating multi-scenario data in MS-PBS modeling to better identify defaulter patterns. To this end, we design a Two-Tower Transformer equipped with a specialized attention mechanism that captures intricate user patterns across scenarios. Extensive experiments conducted on both offline and online real-world business datasets demonstrate the effectiveness and applicability of SSH-T3.
MISS: Multi-Modal Tree Indexing and Searching with Lifelong Sequential Behavior for Retrieval Recommendation
Large-scale industrial recommendation systems typically employ a two-stage paradigm of retrieval and ranking to handle huge amounts of information. Recent research focuses on improving the performance of retrieval model. A promising way is to introduce extensive information about users and items. On one hand, lifelong sequential behavior is valuable. Existing lifelong behavior modeling methods in ranking stage focus on the interaction of lifelong behavior and candidate items from retrieval stage. In retrieval stage, it is difficult to utilize lifelong behavior because of a large corpus of candidate items. On the other hand, existing retrieval methods mostly relay on interaction information, potentially disregarding valuable multi-modal information. To solve these problems, we represent the pioneering exploration of leveraging multi-modal information and lifelong sequence model within the advanced tree-based retrieval model. We propose Multi-modal Indexing and Searching with lifelong Sequence (MISS), which contains a multi-modal index tree and a multi-modal lifelong sequence modeling module. Specifically, for better index structure, we propose multi-modal index tree, which is built using the multi-modal embedding to precisely represent item similarity. To precisely capture diverse user interests in user lifelong sequence, we propose collaborative general search unit (Co-GSU) and multi-modal general search unit (MM-GSU) for multi-perspective interests searching.
BiListing: Modality Alignment for Listings
Airbnb is a leader in offering travel accommodations. Airbnb has historically relied on structured data to understand, rank, and recommend listings to guests due to the limited capabilities and associated complexity arising from extracting meaningful information from text and images. With the rise of representation learning, leveraging rich information from text and photos has become easier. A popular approach has been to create embeddings for text documents and images to enable use cases of computing similarities between listings or using embeddings as features in an ML model. However, an Airbnb listing has diverse unstructured data: multiple images, various unstructured text documents such as title, description, and reviews, making this approach challenging. Specifically, it is a non-trivial task to combine multiple embeddings of different pieces of information, i.e. each image, each review, etc., to reach a single meaningful listing representation, especially if some of the embeddings lie in different spaces. Faced with such a problem, practitioners often resort to unprincipled approaches of averaging embeddings to produce a single one. However, this often results in an inaccurate representation due to loss of information in the averaging process. This paper proposes BiListing, for Bimodal Listing, an approach to align text and photos of a listing by leveraging large-language models and pretrained language-image models. The BiListing approach has several favorable characteristics: capturing unstructured data into a single embedding vector per listing and modality, enabling zero-shot capability to search inventory efficiently in user-friendly semantics, overcoming the cold start problem, and enabling listing-to-listing search along a single modality, or both. We conducted offline and online tests to leverage the BiListing embeddings in the Airbnb search ranking model, and successfully deployed it in production, achieved 0.425% of NDCB gain, and drove tens of millions in incremental revenue.
Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding
Modern manufacturing relies heavily on fusion welding processes, including gas metal arc welding (GMAW). Despite significant advances in machine learning-based quality prediction, current models exhibit critical limitations when confronted with the inherent distribution shifts that occur in dynamic manufacturing environments. In this work, we extend the VQ-VAE Transformer architecture-previously demonstrating state-of-the-art performance in weld quality prediction-by leveraging its autoregressive loss as a reliable out-of-distribution (OOD) detection mechanism. Our approach exhibits superior performance compared to conventional reconstruction methods, embedding error-based techniques, and other established baselines. By integrating OOD detection with continual learning strategies, we optimize model adaptation, triggering updates only when necessary and thereby minimizing costly labeling requirements. We introduce a novel quantitative metric that simultaneously evaluates OOD detection capability while interpreting in-distribution performance. Experimental validation in real-world welding scenarios demonstrates that our framework effectively maintains robust quality prediction capabilities across significant distribution shifts, addressing critical challenges in dynamic manufacturing environments where process parameters frequently change. This research makes a substantial contribution to applied artificial intelligence by providing an explainable and at the same time adaptive solution for quality assurance in dynamic manufacturing processes-a crucial step towards robust, practical AI systems in the industrial environment.
Beyond Pairwise Learning-To-Rank At Airbnb
There are three fundamental asks from a ranking algorithm: it should scale to handle a large number of items, sort items accurately by their utility, and impose a total order on the items for logical consistency. But here's the catch---no algorithm can achieve all three at the same time. We call this limitation the SAT theorem for ranking algorithms. Given the dilemma, how can we design a practical system that meets user needs? Our current work at Airbnb provides an answer, with a working solution deployed at scale. We start with pairwise learning-to-rank (LTR) models---the bedrock of search ranking tech stacks today. They scale linearly with the number of items ranked and perform strongly on metrics like NDCG by learning from pairwise comparisons. They are at a sweet spot of performance vs. cost, making them an ideal choice for several industrial applications. However, they have a drawback---by ignoring interactions between items, they compromise on accuracy. To improve accuracy, we create a ''true'' pairwise LTR model---one that captures interactions between items during pairwise comparisons. But accuracy comes at the expense of scalability and total order, and we discuss strategies to counter these challenges. Traveling further along the road to greater accuracy, we take each item in the search result, and compare it against the rest of the items along two dimensions: (1) Superiority: How strongly do searchers prefer the given item over the remaining ones? (2) Similarity : How similar is the given item to all the other items? This forms the basis of our ''all-pairwise'' LTR framework, which factors in interactions across all items at once. Looking at items on the search result page all together---superiority and similarity combined---gives us a deeper understanding of what searchers truly want. We quantify the resulting improvements in searcher experience through offline and online experiments at Airbnb.
Augmenting Limited and Biased RCTs through Pseudo-Sample Matching-Based Observational Data Fusion Method
In the online ride-hailing pricing context, companies often conduct randomized controlled trials (RCTs) and utilize uplift models to assess the effect of discounts on customer orders, which substantially influences competitive market outcomes. However, due to the high cost of RCTs, the proportion of trial data relative to observational data is small, which only accounts for 0.65% of total traffic in our context, resulting in significant bias when generalizing to the broader user base. Additionally, the complexity of industrial processes reduces the quality of RCT data, which is often subject to heterogeneity from potential interference and selection bias, making it difficult to correct. Moreover, existing data fusion methods are challenging to implement effectively in complex industrial settings due to the high dimensionality of features and the strict assumptions that are hard to verify with real-world data. To address these issues, we propose an empirical data fusion method called pseudo-sample matching. By generating pseudo-samples from biased, low-quality RCT data and matching them with the most similar samples from large-scale observational data, the method expands the RCT dataset while mitigating its heterogeneity. We validated the method through simulation experiments, conducted offline and online tests using real-world data. In a week-long online experiment, we achieved a 0.41% improvement in profit, which is a considerable gain when scaled to industrial scenarios with hundreds of millions in revenue. In addition, we discuss the harm to model training, offline evaluation, and online economic benefits when the RCT data quality is not high, and emphasize the importance of improving RCT data quality in industrial scenarios. Further details of the simulation experiments can be found in the GitHub repository https://github.com/Kairong-Han/Pseudo-Matching.
Development of Autonomous Failure Maintenance System for Semiconductor Manufacturing
Semiconductor equipment failure analysis is critical in the fast-paced semiconductor industry, where complex and sensitive components are susceptible to failures that can lower productivity, raise costs, and shorten equipment lifespan. But conventional failure analysis methods rely on time series-based fault detection classification to identify the cause of failure, and have limitations in relying on experts for appropriate actions. We present a fully deployed autonomous maintenance system that closes the failure recovery loop, from detecting anomalies to executing corrective actions, by integrating graph-based log analysis, LLM-based semantic reasoning and hybrid retrieval mechanisms. Deployed across 762 photolithography equipment at Samsung Electronics fab from 2021 to 2025, the system handled over 85,000 real-world breakdowns, reduced Mean Time to Recovery by 7 minutes on average. This autonomous system enhances productivity, reduces costs, and maintains high product quality, supporting the transition to fully automated environments in the semiconductor industry.
MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
Scaling law has recently been validated in the recommendation system, adopting generative recommendation strategies to achieve scalability. However, these generative approaches require abandoning the meticulously constructed cross features of traditional recommendation models,leading to a significant decline in model performance. To address this challenge, we propose Meituan Generative Recommendation, which is based on the HSTU architecture and is capable of retaining the original deep learning recommendation model (DLRM) features, including cross features. Additionally, MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling. We also propose Group-Layer Normalization (GLN) to enhance the performance of encoding within different semantic spaces and the dynamic masking strategy to avoid information leakage. We further optimize the training frameworks, enabling support for our models with 10 to 100 times computational complexity compared to the DLRM, without significant cost increases. MTGR achieved 65x FLOPs for single-sample forward inference compared to the DLRM model, resulting in the largest gain in nearly two years both offline and online. This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform, where it has been handling the main traffic.
Cross-Domain Graph Neural Networks for Notification at LinkedIn
Notification recommendation systems are critical to driving user engagement on professional platforms like LinkedIn. Designing such systems involves integrating heterogeneous signals across domains, capturing temporal dynamics, and optimizing for multiple, often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework for modeling complex interactions in such environments. In this paper, we present a cross-domain GNN-based system deployed at LinkedIn that unifies user, content, and activity signals into a single, large-scale graph. By training on this cross-domain structure, our model significantly outperforms single-domain baselines on key tasks, including click-through rate (CTR) prediction and professional engagement. We introduce architectural innovations including temporal modeling and multi-task learning, which further enhance performance. Deployed in LinkedIn's notification system, our approach led to a 0.10% lift in weekly active users and a 0.62% improvement in CTR. We detail our graph construction process, model design, training pipeline, and both offline and online evaluations. Our work demonstrates the scalability and effectiveness of cross-domain GNNs in real-world, high-impact applications.
Heterogeneous Influence Maximization in User Recommendation
User recommendation systems enhance user engagement by encouraging users to act as inviters to interact with other users (invitees), potentially fostering information propagation. Conventional recommendation methods typically focus on modeling interaction willingness. Influence-Maximization (IM) methods focus on identifying a set of users to maximize the information propagation. However, existing methods face two significant challenges. First, recommendation methods fail to unleash the candidates' spread capability. Second, IM methods fail to account for the willingness to interact. To solve these issues, we propose two models named HeteroIR and HeteroIM. HeteroIR provides an intuitive solution to unleash the dissemination potential of user recommendation systems. HeteroIM fills the gap between the IM method and the recommendation task, improving interaction willingness and maximizing spread coverage. The HeteroIR introduces a two-stage framework to estimate the spread profits. The HeteroIM incrementally selects the most influential invitee to recommend and rerank based on the number of reverse reachable (RR) sets containing inviters and invitees. RR set denotes a set of nodes that can reach a target via propagation. Extensive experiments show that HeteroIR and HeteroIM significantly outperform the state-of-the-art baselines with the p-value<0.05. Furthermore, we have deployed HeteroIR and HeteroIM in Tencent's online gaming platforms and gained an 8.5% and 10% improvement in the online A/B test, respectively. Implementation codes are available at https://github.com/socialalgo/HIM.
An LLM-based Behavior Modeling Framework for Malicious User Detection
Malicious users pose significant threats to social platforms. Extensive efforts have leveraged user behavior sequences to model relationships between various actions and capture behavioral patterns for malicious user detection; however, they rely on behavior IDs, ignoring valuable behavior content such as self-introductions in friend requests, which offer crucial clues for detecting malicious user. We thus propose leveraging Large Language Models (LLMs) to jointly model IDs and content in user behavior sequences. The key to effective malicious user detection is to infer malicious user behavior patterns. However, inferring these patterns from labeled behavior sequences suffers from poor data efficiency and limited generalization, resulting in suboptimal malicious user detection performance. To overcome the limitations, we propose leveraging the malicious user specifications (e.g., definitions and common deceptive tactics) from the existing expert handbook. These specifications guide LLMs in reasoning over user behavior IDs and content before making predictions. To this end, we introduce an LLM-based behavior modeling framework with an expert handbook to enhance LLMs' behavior reasoning. We first distill the user's behaviors into a concise summary, guided by malicious user specifications in the expert handbook, and then feed the summary and users' demographic features into LLMs for comprehensive reasoning and detection. We conduct extensive online and offline experiments on the Weixin platform, validating the superiority of the proposed framework over the original Weixin detection baseline, achieving, for example, a 5.34% improvement in F1-Score.
GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning
Proteins are central to biological processes and indispensable for living organisms. Accurate representation of proteins is crucial, especially in drug development. Recent advances have applied machine learning for unsupervised protein representation learning. However, these approaches often focus solely on the amino acid sequence of proteins and lack factual knowledge about proteins and their interactions, thus limiting their performance. In this study, we present GOProteinGNN, a novel architecture that enhances protein language models by integrating protein knowledge graph information during the creation of amino acid level representations. Our approach allows for the integration of information at both the individual amino acid level and the entire protein level, enabling a comprehensive and effective learning process through graph-based learning. By doing so, we can capture complex relationships and dependencies between proteins and their functional annotations, resulting in more robust and contextually enriched protein representations. Unlike previous methods, GOProteinGNN uniquely learns the entire protein knowledge graph during training, which allows it to capture broader relational nuances and dependencies beyond mere triplets as done in previous work. We perform a comprehensive evaluation on several downstream tasks, demonstrating that GOProteinGNN consistently outperforms previous methods, showcasing its effectiveness and establishing it as a state-of-the-art solution for protein representation learning. We discuss the practical integration of GOProteinGNN in a laboratory setting for lipid nanoparticle-based drug delivery, aiming to bypass the blood-brain barrier and discover novel components, with positive results observed in mice.
Smart ECU: Scalable On-Vehicle Deployment of Drivetrain Fault Classification Systems for Commercial Electric Vehicles
We present Smart ECU, the first on-vehicle drivetrain fault classification solution for motor-reducers on commercial electric vehicles (EVs), designed to be scalable in mass production. To develop and validate this system, we collect real-world vibration data from seven different EV models (e.g. Hyundai IONIQ 5, KIA EV6) and over 19 drivetrains under diverse driving conditions. This work addresses key challenges in deploying motor-reducer fault classification functionality onto an extremely resource-constrained ECU environment, facilitating on-vehicle deployment of PHM solutions on commercially manufactured EVs. Specifically, we tackle the following challenges: (1) real-vehicle data collection, (2) development under tight ECU resource constraints, (3) class imbalance between normal and fault conditions, and (4) scalability of the fault classification system across different EV models with limited fault data availability. We deploy and evaluate Smart ECU on both intra-car and inter-car scenarios, showing strong generalization performance in both setups. The proposed method enables rapid development of fault classification for new vehicle designs without requiring fault data from customer usage, significantly shortening the deployment timeline. Our solution addresses both technical and industrial challenges in deploying ECU-based smart diagnostics for commercial EVs, while also demonstrating broader applicability beyond drivetrain systems to other critical vehicle components. To the best of our knowledge, this is the first work to (1) collect real-world motor-reducer fault data, (2) implement a lightweight fault classification algorithm on ECUs, and (3) demonstrate its scalability across various EV types.
DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection
We propose DeepAries, a novel deep reinforcement learning framework for dynamic portfolio management that jointly optimizes the timing and allocation of rebalancing decisions. Unlike prior reinforcement learning methods that employ fixed rebalancing intervals regardless of market conditions, DeepAries adaptively selects optimal rebalancing intervals along with portfolio weights to reduce unnecessary transaction costs and maximize risk-adjusted returns. Our framework integrates a Transformer-based state encoder, which effectively captures complex long-term market dependencies, with Proximal Policy Optimization (PPO) to generate simultaneous discrete (rebalancing intervals) and continuous (asset allocations) actions. Extensive experiments on multiple real-world financial markets demonstrate that DeepAries significantly outperforms traditional fixed-frequency and full-rebalancing strategies in terms of risk-adjusted returns, transaction costs, and drawdowns. Additionally, we provide a live demo of DeepAries at https://deep-aries.github.io/, along with the source code and dataset at https://github.com/dmis-lab/DeepAries, illustrating DeepAries' capability to produce interpretable rebalancing and allocation decisions aligned with shifting market regimes. Overall, DeepAries introduces an innovative paradigm for adaptive and practical portfolio management by integrating both timing and allocation into a unified decision-making process.
Exploring Database Normalization Effects on SQL Generation
Schema design, particularly normalization, is a critical yet often overlooked factor in natural language to SQL (NL2SQL) systems. Most prior research evaluates models on fixed schemas, overlooking the influence of design on performance. We present the first systematic study of schema normalization's impact, evaluating eight leading large language models on synthetic and real-world datasets with varied normalization levels. We construct controlled synthetic datasets with formal normalization (1NF-3NF) and real academic paper datasets with practical schemes. Our results show that denormalized schemas offer high accuracy on simple retrieval queries, even with cost-effective models in zero-shot settings. In contrast, normalized schemas (2NF/3NF) introduce challenges such as errors in base table selection and join type prediction; however, these issues are substantially mitigated by providing few-shot examples. For aggregation queries, normalized schemas yielded better performance, mainly due to their robustness against the data duplication and NULL value issues that cause errors in denormalized schemas. These findings suggest that the optimal schema design for NL2SQL applications depends on the types of queries to be supported. Our study demonstrates the importance of considering schema design when developing NL2SQL interfaces and integrating adaptive schema selection for real-world scenarios.
THEME: Enhancing Thematic Investing with Semantic Stock Representations and Temporal Dynamics
Thematic investing, which aims to construct portfolios aligned with structural trends, remains a challenging endeavor due to overlapping sector boundaries and evolving market dynamics. A promising direction is to build semantic representations of investment themes from textual data. However, despite their power, general-purpose LLM embedding models are not well-suited to capture the nuanced characteristics of financial assets, since the semantic representation of investment assets may differ fundamentally from that of general financial text. To address this, we introduce THEME, a framework that fine-tunes embeddings using hierarchical contrastive learning. THEME aligns themes and their constituent stocks using their hierarchical relationship, and subsequently refines these embeddings by incorporating stock returns. This process yields representations effective for retrieving thematically aligned assets with strong return potential. Empirical results demonstrate that THEME excels in two key areas. For thematic asset retrieval, it significantly outperforms leading large language models. Furthermore, its constructed portfolios demonstrate compelling performance. By jointly modeling thematic relationships from text and market dynamics from returns, THEME generates stock embeddings specifically tailored for a wide range of practical investment applications.
Waypoint POI Recommendation for Vehicle Navigation Services using Hierarchical Graphs and Contrastive Learning
Modern vehicle navigation systems can greatly benefit from waypoint point-of-interest (POI) recommendation, which suggests personalized intermediate stops along a driving route. This paper defines the novel waypoint POI recommendation problem: given a starting point and a destination, recommend one or more personalized POIs to visit en route. This scenario (e.g., suggesting a lunch stop during a road trip) differs from the conventional ''next POI'' recommendation in that it infers waypoint POIs from only two (origin and destination) inputs and predicts multiple intermediate stops rather than a single next location. To solve this problem, we propose WayPOI, a novel recommender model for Waypoint POI suggestion based on hierarchical graph based contrastive learning (WayPOI). WayPOI constructs a hierarchical graph that captures both individual and group-level behavioral patterns of users and POIs, and it employs a contrastive learning strategy to learn effective user and POI representations from sparse data. Through experiments on real-world driving data provided by Hyundai as well as on three public datasets, we demonstrate that WayPOI significantly outperforms several recent POI recommendation models, even though these baselines were carefully re-formed and retrained to perform waypoint recommendation for a fair comparison. Our ablation study confirms the benefit of each proposed component.
Anomaly Detection for Advanced Driver Assistance System with NCDE-based Normalizing Flow
For electric vehicles, the Adaptive Cruise Control (ACC) in Advanced Driver Assistance Systems (ADAS) is designed to assist braking based on driving conditions and user patterns. However, the driving data collected during development are limited and lack diversity, leading to late or aggressive braking. Moreover, it is necessary to effectively identify anomalies in braking patterns, which is critical for self-driving autonomous vehicles. We propose Graph Neural Controlled Differential Equation Normalizing Flow (GDFlow), which leverages Normalizing Flow (NF) with Neural Controlled Differential Equations (NCDE) to learn the distribution of normal driving patterns. Our approach captures spatio-temporal information from sensor data and accurately models continuous changes in driving patterns. Additionally, we introduce a quantile-based maximum likelihood objective to improve the likelihood estimate of normal data at the margin of the distribution. We validate GDFlow using real-world electric vehicle driving data that we collected from Hyundai IONIQ5 and GV80EV. Our model achieves state-of-the-art (SOTA) performance compared to nine baselines across four dataset configurations of different vehicle types and drivers. Furthermore, our model outperforms the latest anomaly detection methods across four time series benchmark datasets. Our approach demonstrates superior efficiency in inference time compared to existing methods.
OASIS: Harnessing Diffusion Adversarial Network for Ocean Salinity Imputation using Sparse Drifter Trajectories
Ocean salinity plays a vital role in circulation, climate, and marine ecosystems, yet its measurement is often sparse, irregular, and noisy, especially in drifter-based datasets. Traditional approaches, such as remote sensing and optimal interpolation, rely on linearity and stationarity, and are limited by cloud cover, sensor drift, and low satellite revisit rates. While machine learning models offer flexibility, they often fail under severe sparsity and lack principled ways to incorporate physical covariates without specialized sensors. In this paper, we introduce the OceAn Salinity Imputation System, a novel diffusion adversarial framework designed to address these challenges by: (1) employing a transformer-based global dependency capturing module to learn long-range spatio-temporal correlations from sparse trajectories; (2) constructing a generative imputation model that conditions on easily observed tidal covariates to progressively refine imputed salinity fields; and (3) using a scheduler diffusion method to enhance the model's robustness. This unified architecture exploits the periodic nature of tidal signals as a proxy for unmeasured physical drivers, without the need for additional equipment. We evaluate OASIS on four benchmark datasets, including one real-world measurement from Fort Pierce Inlet and three simulated Gulf of Mexico trajectories. Results show consistent improvements over both traditional and neural baselines, achieving up to 52.5% reduction in MAE compared to Kriging. We also develop a lightweight, web-based deployment system that enables salinity imputation through interactive and batch interfaces, available at: https://github.com/yfeng77/OASIS.
SCAlign: Transaction Event Prediction via Multi-Scale Market Dynamics Alignment
Event prediction plays a pivotal role in analyzing consumer behavior for inventory and pricing optimization. In dynamic financial markets, customer behavior is often influenced by price commitment policies, where the historical and pre-announced future transaction price dynamics can lead to complex behavior patterns, such as advance consumption or delayed purchasing. Therefore, these phenomena pose significant challenges to traditional event modeling approaches that rely solely on consumer behaviors. To address this problem, we propose SCAlign, a cross-domain and multi-scale framework for market dynamics alignment, designed for event prediction. Our model integrates both heterogeneous historical and limited observable future commitment prices at different scales, aligning customer behavior with market fluctuations across multiple time scales. Finally, through a Mixture-of-Experts (MoE) framework, the model dynamically fuses these aligned features, enabling adaptive selection of the relevant and appropriate representations for prediction tasks. Empirical evaluations across diverse transaction environments demonstrate that our model outperforms state-of-the-art prediction baselines. Furthermore, it achieves optimal performance across varying data scales, showcasing its robustness and generalizability.
Taming Ultra-Long Behavior Sequence in Session-wise Generative Recommendation
Generative recommendation has emerged as a transformative paradigm in recommender systems, enabling modeling user behavior autoregressively without explicit target conditioning. While this approach eliminates the need for target signals, it necessitates compressing extensive historical interactions-potentially spanning lifelong sequences-into coherent interest representations. Conventional methods for handling long sequences typically rely on target-guided search mechanisms (e.g., SIM) to efficiently filter and compress behaviors. However, this strategy is incompatible with generative frameworks due to their target-agnostic nature. To address these challenges, we propose a novel encoder-decoder model named HiCoGen (Hierarchical Compression-based Session-wise Generative Model), which efficiently models long-term interests in generative models. In the encoder, HiCoGen compresses behavior sequences using hierarchical content similarity clustering and employs a hierarchical attention architecture to reduce sequence length while preserving information integrity. In the decoder, HiCoGen uses session-wise generation instead of point-wise generation to better align with industrial short-video applications. To enhance the stability of session-wise generation, we introduce an auxiliary Hierarchical Multi-Token Prediction module. Extensive experiments on public and industrial datasets show significant performance gains over state-of-the-art methods (21.2% in ML-1M and 35.6% in industrial datasets on NDCG@3). We also conducted visualization and performance analysis to explore the advantages of long sequence modeling.
VocQuiz: Vocabulary Question Generation for English Language Education
Designing effective English vocabulary question generation tools demands a shift from labor-intensive content creation to large language model (LLM) automation that can adapt to varied educational contexts. Current approaches tend to offer a limited variety of question types, which restricts their practical application in real classroom settings. To better meet the demands of English teaching institutions, we present VocQuiz, a vocabulary question generation system that 1) combines generalization capabilities of LLMs with reliable language resources, including dictionaries, NLP datasets and authentic corpora, to enhance both contextual relevance and linguistic accuracy; 2) supports multiple question types, such as similar word selection and word collocation, to accommodate various instructional requirements; and 3) employs an iterative workflow to iteratively generate and refine questions, ensuring high-quality outputs and consistent assessment standards. VocQuiz offers a practical, deployable solution that helps educators create quiz-based instructional materials, reducing preparation effort while effectively assessing students' mastery of vocabulary.
MHSNet: An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model
To maintain the company's talent pool, recruiters need to continuously search for resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes are often incomplete and inaccurate. To improve the quality of third-party resumes and enrich the company's talent pool, it is essential to conduct duplication detection between the fetched resumes and those already in the company's talent pool. Such duplication detection is challenging due to the semantic complexity, structural heterogeneity, and information incompleteness of resume texts. To this end, we propose MHSNet, an multi-level identity verification framework that fine-tunes BGE-M3 using contrastive learning. With the fine-tuned BGE-M3, MHSNet generates multi-level sparse and dense representations for resumes, enabling the computation of corresponding multi-level semantic similarities. Moreover, the state-aware Mixture-of-Experts (MoE) is employed in MHSNet to handle diverse incomplete resumes. Experimental results verify the effectiveness of MHSNet.
TBGRecall: A Generative Retrieval Model for E-commerce Recommendation Scenarios
Recommendation systems are essential tools in modern e-commerce, facilitating personalized user experiences by suggesting relevant products. Recent advancements in generative models have demonstrated potential in enhancing recommendation systems; however, these models often exhibit limitations in optimizing retrieval tasks, primarily due to their reliance on autoregressive generation mechanisms. Conventional approaches introduce sequential dependencies that impede efficient retrieval, as they are inherently unsuitable for generating multiple items without positional constraints within a single request session. To address these limitations, we propose TBGRecall, a framework integrating Next Session Prediction (NSP), designed to enhance generative retrieval models for e-commerce applications. Our framework reformulation involves partitioning input samples into multi-session sequences, where each sequence comprises a session token followed by a set of item tokens, and then further incorporate multiple optimizations tailored to the generative task in retrieval scenarios. In terms of training methodology, our pipeline integrates limited historical data pre-training with stochastic partial incremental training, significantly improving training efficiency and emphasizing the superiority of data recency over sheer data volume. Our extensive experiments, conducted on public benchmarks alongside a large-scale industrial dataset from TaoBao, show TBGRecall outperforms the state-of-the-art recommendation methods, and exhibits a clear scaling law trend. Ultimately, NSP represents a significant advancement in the effectiveness of generative recommendation systems for e-commerce applications.
Stratified Expert Cloning for Retention-Aware Recommendation at Scale
User retention is critical in large-scale recommender systems, significantly influencing online platforms' long-term success. Existing methods typically focus on short-term engagement, neglecting the evolving dynamics of user behaviors over time. Reinforcement learning (RL) methods, though promising for optimizing long-term rewards, face challenges like delayed credit assignment and sample inefficiency. We introduce Stratified Expert Cloning (SEC), an imitation learning framework that leverages abundant interaction data from high-retention users to learn robust policies. SEC incorporates: 1) multi-level expert stratification to model diverse retention behaviors; 2) adaptive expert selection to dynamically match users with appropriate policies based on their state and retention history; and 3) action entropy regularization to enhance recommendation diversity and policy generalization. Extensive offline evaluations and online A/B tests on major video platforms (Kuaishou and Kuaishou Lite) with hundreds of millions of users validate SEC's effectiveness. Results show substantial improvements, achieving cumulative lifts of 0.098% and 0.122% in active days on the two platforms respectively, each translating into over 200,000 additional daily active users.
GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction
In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder and a dynamic autoregressive decoder to generate causal reranking sequences. Subsequently, we pre-train Gen-Reranker on the item exposure order for high-quality parameter initialization. To eliminate the need for the evaluator while integrating sequence-level evaluation during training for end-to-end optimization, we propose post-training the model through Rerank-DPO. Moreover, for efficient autoregressive inference, we introduce ordered multi-token prediction (OMTP), which trains Gen-Reranker to simultaneously generate multiple future items while preserving their order, ensuring practical deployment in real-time recommender systems. Extensive offline experiments demonstrate that GReF outperforms state-of-the-art reranking methods while achieving latency that is nearly comparable to non-autoregressive models. Additionally, GReF has also been deployed in a real-world video app Kuaishou with over 300 million daily active users, significantly improving online recommendation quality.
AutoDW-TS: Automated Data Wrangling for Time-Series Data
Data wrangling - the process of preparing raw data for analysis through cleansing, transformation, and enrichment - is a critical step in the data science pipeline. Its importance is amplified for time-series data, which underpins many applications, with forecasting being one of the most prominent tasks. Yet, current practices remain largely manual, time-consuming, and error-prone, limiting productivity and scalability. In this paper, we introduce AutoDW-TS, an automated approach to time-series data wrangling powered by Large Language Models (LLMs). Our method offers an end-to-end pipeline, automating key stages such as table merging, prediction engineering, cleansing, imputation, and enrichment. To support diverse use cases, we developed multiple systems, including an interactive AutoDW-TS WebApp, Web APIs, and an AI agent. We share insights from developing and deploying these systems, along with results from an extensive evaluation across 38 time-series benchmarks. Our findings show that AutoDW-TS significantly improves forecasting performance, demonstrating its effectiveness and potential to transform time-series data preparation at scale.
Prompt Tuning as User Inherent Profile Inference Machine
Large Language Models (LLMs) have exhibited significant promise in recommender systems by empowering user profiles with their extensive world knowledge and superior reasoning capabilities. However, LLMs face challenges like unstable instruction compliance, modality gaps, and high inference latency, leading to textual noise and limiting their effectiveness in recommender systems. To address these challenges, we propose UserIP-Tuning, which uses prompt-tuning to infer user profiles. It integrates the causal relationship between user profiles and behavior sequences into LLMs' prompts. It employs Expectation Maximization (EM) to infer the embedded latent profile, minimizing textual noise by fixing the prompt template. Furthermore, a profile quantization codebook bridges the modality gap by categorizing profile embeddings into collaborative IDs pre-stored for online deployment. This improves time efficiency and reduces memory usage. Experiments show that UserIP-Tuning outperforms state-of-the-art recommendation algorithms. An industry application confirms its effectiveness, robustness, and transferability. The presented solution has been deployed in Huawei AppGallery's Explore page since May 2025, serving 2 million daily active users, delivering significant improvements in real-world recommendation scenarios. The code is publicly available for replication at https://github.com/Applied-Machine-Learning-Lab/UserIP-Tuning.
TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance
Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: (1) denoising raw external knowledge and (2) adapting semantic representations. To address these challenges, we propose exTernal knowledge-enhanced RecommendAtion With LLM assistance (TRAWL). This method utilizes large language models to extract relevant recommendation knowledge from raw external data and employs a contrastive learning strategy for adapter training. Experiments on public datasets and real-world online recommender systems validate the effectiveness of our approach.
QARM: Quantitative Alignment Multi-Modal Recommendation at Kuaishou
In recent years, with the significant evolution of multi-modal large models, many recommender researchers realized the potential of multi-modal information for user interest modeling. In industry, a wide-used modeling architecture is a cascading paradigm: (1) first pre-training a multi-modal model to provide omnipotent representations for downstream services; (2) The downstream recommendation model takes the multi-modal representation as additional input to fit real user-item behaviours. Although such paradigm achieves remarkable improvements, however, there still exist two problems that limit model performance: (1) Representation Unmatching: The pre-trained multi-modal model is always supervised by the classic NLP/CV tasks, while the recommendation models are supervised by real user-item interaction. As a result, the two fundamentally different tasks' goals were relatively separate, and there was a lack of consistent objective on their representations; (2) Representation Unlearning: The generated multi-modal representations are always stored in cache store and serve as extra fixed input of recommendation model, thus could not be updated by recommendation model gradient, further unfriendly for downstream training. Inspired by the two difficulties challenges in downstream tasks usage, we introduce a quantitative multi-modal framework to customize the specialized and trainable multi-modal information for different downstream models. Specifically, we introduce two insightful modifications to enhance above framework: (1) Item Alignment to transform the original multi-modal representations to match the real user-item behaviours distribution. (2) Quantitative Code to transform the aligned multi-modal representations to trainable code ID for downstream tasks. We conduct detailed experiments and ablation analyses to demonstrate our QARM effectiveness. Our method has been deployed on Kuaishou's various services, serving 400 million users daily.
SMTIR: Scenario-Aware Multi-Trigger Induction Network for CTR Prediction
Trigger-Induced Recommendation (TIR), which aims to predict user interest based on a trigger item, has gained considerable traction on e-commerce platforms. Current TIR methods typically analyze user intent by integrating explicit interest in the trigger item and implicit interest derived from user historical behaviors. However, these methods often overlook the contextual information and occurring scenarios related to the trigger, resulting in an undue emphasis on isolated trigger items and a consequently restrictive understanding of users' short-term intentions. To address these challenges, we propose a novel scenario-aware multi-trigger induction method featuring three key enhancements: (1) The Context Modeling Network learns contextual information associated with the trigger during the request, improving the understanding of users' real intentions regarding the trigger item; (2) The Multi-Trigger Learning Network introduces user latent triggers from various scenarios to uncover users' potential external preferences; (3) The Scenario Induction Network captures the characteristics of the scenarios in which triggers occur and performs induction to yield scenario-aware user intentions prediction. We validate our approach through experiments on multiple industrial datasets, demonstrating the model's effectiveness. Furthermore, we have integrated the model into an online advertising system, achieving a 5.46% improvement in Click-Through Rate (CTR).
IclForge: Enhancing In-Context Learning with Evolutionary Algorithms under Budgeted Annotation
In-context learning (ICL) has emerged as a powerful paradigm for adapting Large Language Models (LLMs) to specific tasks without parameter updates. While various strategies exist for selecting relevant ICL exemplars from a labeled pool, the fundamental challenge of constructing this high-quality pool remains largely unexplored, especially for new tasks or domains with limited labeled data. We present IclForge, a novel active learning framework that efficiently selects informative examples from unlabeled datasets to be annotated and included in the ICL pool. Unlike traditional active learning methods that optimize for individual example informativeness, IclForge explicitly considers the interdependence of examples within the ICL context. Through extensive experiments across diverse datasets and LLM architectures, we show that IclForge outperforms standard active learning baselines by +180-450 basis points while requiring 50% fewer annotations. Our framework is complementary to existing ICL selection strategies and extends naturally to generative applications, which we demonstrate through experiments on Math Word Problem (MWP) tasks. These results highlight IclForge's effectiveness in constructing high-quality ICL exemplar pools in resource-constrained scenarios.
Next-Generation Price Recommendation with LLM-Augmented Graph Transformers
Dynamic pricing on two-sided platforms such as Airbnb presents complex challenges due to the heterogeneity of listings, user behaviours, and contextual variables. In this work, we propose a robust and interpretable pricing framework that leverages Large Language Models (LLMs) and prompt engineering to automate the generation of high-level meta-features from unstructured and structured listing data. These meta-features are designed to capture nuanced semantic features that are often overlooked by traditional feature engineering pipelines. We further integrate these representations into a Transformer-based Graph Neural Network (GNN), which models the relational and spatial dependencies between listings in a data-driven and several relation-construction manner. By combining prompt-driven embeddings with graph-aware contextual learning, our framework significantly enhances price recommendation accuracy while offering transparency through assortativity analysis. Extensive experiments on real-world Airbnb datasets demonstrate our approach's performance in both prediction and unseen data across neighbourhoods and output interpretability. This work highlights the potential of unifying LLMs, structured graph learning, and interpretable AI for next-generation dynamic pricing systems.
On the Gap Between Diffusion and Transformer Multi-Tabular Generation
Shareable tabular data is of high importance in industry and research. While generating synthetic records is well-studied, research has only recently extended to relational data synthesis. In the tabular generation setting, diffusion and transformer models exhibit superior performance over prior art. However, in the relational setting, diffusion models outperform transformers. This work focuses on the performance gap between tabular transformers and diffusion models in single (tabular) and multi-table (relational) settings, using REaLTabformer and ClavaDDPM as representative state-of-the-art models. We evaluate these architectures on a set of single- and multi-table datasets, highlighting the gap's root causes between the methods. In our experiments, we attribute this difference to the influence of contextual information and data representation. To bridge the gap in the relational setting, we propose two seemingly simple strategies: layer sharing and contextual cues. This work1 offers insights into key design considerations for single- and multitable generative models, including the incorporation of contextual information and the reuse of existing knowledge. With the proposed methods, we achieve improvements of 1.52× and 1.94× for the Logistic Detection and Discriminator Measure metrics, respectively.
MOHPER: Multi-objective Hyperparameter Optimization Framework for E-commerce Retrieval System
E-commerce retrieval and ranking optimization have expanded to incorporate broader metrics that capture user engagement and business objectives. Modern search frameworks now incorporate advanced quality features, such as sales counts and document-query relevance, to better align search results with these goals. Traditional methods typically focus on click-through rate (CTR) as a measure of engagement or relevance, but this can miss true purchase intent, creating a gap between user interest and actual conversions. Joint training with the click-through conversion rate (CTCVR) has become essential for understanding buying behavior, although its sparsity poses challenges for reliable optimization. This study presents MOHPER, a Multi-Objective HyperParameter optimization framework for E-commerce Retrieval systems. Using Bayesian optimization and sampling, it jointly optimizes CTR, CTCVR, and other relevant objectives, with a focus on user engagement and conversion. To enhance configuration selection in multi-objective optimization, we propose advanced hyperparameter selection methods, including a meta-configuration voting strategy and a cumulative training approach that leverages prior optima to improve training efficiency and performance. Currently deployed in a live setting, our proposed framework substantiates its practical efficacy in achieving a balanced optimization that aligns with both user satisfaction and revenue goals.
Expert-Guided Diffusion Planner for Auto-Bidding
Auto-bidding is widely used in advertising systems, serving a diverse range of advertisers. Generative bidding is increasingly gaining traction due to its strong planning capabilities and generalizability. Unlike traditional reinforcement learning-based bidding, generative bidding does not depend on the Markov Decision Process (MDP), thereby exhibiting superior planning performance in long-horizon scenarios. Conditional diffusion modeling approaches have shown significant promise in the field of auto-bidding. However, relying solely on return as the optimality criterion is insufficient to guarantee the generation of truly optimal decision sequences, as it lacks personalized structural information. Moreover, the auto-regressive generation mechanism of diffusion models inherently introduces timeliness risks. To address these challenges, we introduce a novel conditional diffusion modeling approach that integrates expert trajectory guidance with a skip-step sampling strategy to improve generation efficiency. The efficacy of this method has been demonstrated through comprehensive offline experiments and further substantiated by statistically significant outcomes in online A/B testing, yielding an 11.29% increase in conversions and a 12.36% growth in revenue relative to the baseline.
Thematic Bottleneck Models for Multimodal Analysis of School Attendance
Regular school attendance is critical for young people, supporting academic achievement, social development, and the cultivation of lifelong habits. Existing research for analysing attendance patterns often relies on structured survey data targeted at their parents and teachers, which overlooks students' perspectives and experiences. To address this gap, our team developed and deployed the Our Journey platform, which enables young people to share their experiences through multimodal responses such as texts and images, offering unique insights into the factors influencing school attendance. The data is linked to official attendance records from the Ministry of Education, allowing the modelling of attendance outcomes based on students' input. To effectively analyse the data, we propose Thematic Bottleneck Models (TBMs) to enhance the understanding of subjective experiences behind data and the interpretability of attendance modelling. TBMs introduce qualitative concepts as intermediate labels, mapping multimodal data to qualitative insights from thematic analysis before the outcomes. The attendance modelling with TBMs outperforms existing multimodal methods in predicting attendance percentage and persistent absenteeism. Analysis of themes within TBMs reveals motivational and contextual factors associated with regular attendance and persistent absenteeism. The findings are used to inform education policy and guide strategies to support student engagement in New Zealand.
Spatial Semantic-based Enhanced Address Parsing via Adaptive Weighted Learning
Address parsing is an essential task that transforms natural language descriptions into standardized addresses, crucial for numerous urban applications. Existing methods struggle with ambiguous expressions, and even Large Language Models face challenges adapting to specialized domains with limited data. In this study, we focus on developing a robust framework to map diverse address descriptions into a unified semantic space of standardized addresses. We propose the Adaptive Weighted Learning-based Address Parsing (AWLAP) framework, which enhances parsing effectiveness through two key components: a multi-level constrained classifier that mines correlations between geographic entities across hierarchies, and an integrated discriminator that adaptively guides optimization based on parsing complexity. We evaluate the AWLAP using real data from JD Logistics and Point-of-Interest addresses. Extensive experiments comparing against state-of-the-art methods demonstrate AWLAP's effectiveness and robustness in address parsing. The proposed AWLAP framework has been successfully deployed as an address parsing service in practical applications.
Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora
A classifier using byte n-grams as features is the only approach we have found fast enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency (sub 10 ms) for deployment in numerous malware detection scenarios. However, we've consistently found that 6-8 grams achieve the best accuracy on our production deployments but have been unable to deploy regularly updated models due to the high cost of finding the top-k most frequent n-grams over terabytes of executable programs. Because the Zipfian distribution well models the distribution of n-grams, we exploit its properties to develop a new top-k n-gram extractor that is up to 35× faster than the previous best alternative. Using our new Zipf-Gramming algorithm, we are able to scale up our production training set and obtain up to 30% improvement in AUC at detecting new malware. We show theoretically and empirically that our approach will select the top-k items with little error and the interplay between theory and engineering required to achieve these results.
HuggingGraph: Understanding the Supply Chain of LLM Ecosystem
Large language models (LLMs) leverage deep learning architectures to process and predict sequences of words, enabling them to perform a wide range of natural language processing tasks, such as translation, summarization, question answering, and content generation. As existing LLMs are often built from base models or other pre-trained models and use external datasets, they can inevitably inherit vulnerabilities, biases, or malicious components that exist in previous models or datasets. Therefore, it is critical to understand these components' origin and development process to detect potential risks, improve model fairness, and ensure compliance with regulatory frameworks. Motivated by that, this project aims to study such relationships between models and datasets, which are the central parts of the LLM supply chain. First, we design a methodology to systematically collect LLMs' supply chain information. Then, we design a new graph to model the relationships between models and datasets, which is a directed heterogeneous graph, having 402,654 nodes and 462,524 edges. Lastly, we perform different types of analysis and make multiple interesting findings.
Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference
Trajectory data has become a key resource for automated map inference due to its low cost, broad coverage, and continuous availability. However, uneven trajectory density often leads to fragmented roads in sparse areas and redundant segments in dense regions, posing significant challenges for existing methods. To address these issues, we propose DGMap, a dual-decoding framework with global context awareness, featuring Multi-scale Grid Encoding, Mask-enhanced Keypoint Extraction, and Global Context-aware Relation Prediction. By integrating global semantic context with local geometric features, DGMap improves keypoint detection accuracy to reduce road fragmentation in sparse-trajectory areas. Additionally, the Global Context-aware Relation Prediction module suppresses false connections in dense-trajectory regions by modeling long-range trajectory patterns.Experimental results on three real-world datasets show that DGMap outperforms state-of-the-art methods by 5% in APLS, with notable performance gains on trajectory data from the Didi Chuxing platform.
LinkML for Collaborative Petrochemical Knowledge Graph Development
LinkML is an emerging ontology modeling framework using YAML syntax instead of the typical semantic web technologies such as OWL. LinkML's approachable syntax allows a wide range of stakeholders-including domain experts, software engineers, and data scientists-to collaboratively define and refine a shared semantic model. We leveraged LinkML to develop a 1,200+ class ontology capturing complex physical infrastructure, emissions sources, equipment relationships, and time-series observations critical to source-level emissions attribution. We developed tooling to automate transformation of LinkML into graph database schemas, entity-relationship (ER) diagrams, documentation, and typed domain model code, supporting rapid iteration and semantic consistency across systems. The resulting large-scale knowledge graph, deployed in a property graph database, materializes digital twins for over 700 petrochemical facilities and more than 61,000 pieces of equipment. This approach dramatically simplifies emissions analytics by standardizing data integration and replacing fragile, bespoke SQL logic with intuitive graph queries. The accessibility of LinkML enables domain experts to directly contribute to the model, while providing a robust foundation for engineers and data scientists to perform scalable, reliable analytics. The result is a unified platform for emissions reporting and digital transformation. This paper presents the use of LinkML for semantic modeling for emissions reporting in the petrochemical industry, the tooling supporting its deployment, and the challenges and successes of the approach.
AutoCoRe-FL: Automatic Concept-based Rule Reasoning in Federated Learning
Federated learning (FL) enables decentralized model training without centralizing raw data, yet achieving interpretability under such constraints remains challenging. We propose AutoCoRe-FL, a framework for interpretable FL that eliminates the need for predefined or manually labeled concepts. In AutoCoRe-FL, each client automatically extracts high-level visual concepts-clusters of semantically coherent image regions that correspond to human-understandable properties-using local segmentation, self-supervised representation learning, and clustering. These concepts are used to encode data as interpretable vectors, from which clients train symbolic models that generate rule-based explanations. The server then aggregates these rules through an iterative, communication-efficient process to build a global, coherent, and transparent model. Experiments on benchmark datasets demonstrate that AutoCoRe-FL produces accurate symbolic explanations while achieving competitive predictive performance. Notably, it outperforms LR-XFL-the current state-of-the-art interpretable FL baseline that relies on predefined concept supervision-in both rule quality and classification accuracy.
PRECISE: Pre-training and Fine-tuning Sequential Recommenders with Collaborative and Semantic Information
Recommendation platforms commonly offer diverse content scenarios for users to interact with. Pre-training models are the most commonly used approach in recommendation systems to capture users' full-domain interests. Traditional ID-based pre-training models mainly capture user interests by leveraging collaborative signals. However, a prevalent drawback of those systems is the incapacity to handle cold-start scenarios. With the recent advent of large language models, there has been a significant increase in research efforts exploiting LLMs to extract semantic information for items. However, text-based recommendations highly rely on elaborate feature engineering and often fail to capture collaborative similarities. To overcome these limitations, we propose a novel pre-training and fine-tuning framework for sequential recommendation, termed Precise. Precise employs a pre-training framework that models users' comprehensive interests across all recommendation scenarios combining collaborative signals with semantic information. To address rapid shifting data distributions in recommendation scenarios, we further propose a fine-tuning phase tailored to specific target scenarios/tasks, thereby achieving efficient industrial deployment while maintaining fast responsiveness. Additionally, we introduce practical training strategies to enhance the model's performance in real-world applications. Empirical findings reveal that the Precise framework attains outstanding performance in both offline experiments and online A/B tests. Precise has been fully deployed in multiple online recommendation scenarios in WeChat.
LinkedIn Post Embeddings: Industrial Scale Embedding Generation and Usage across LinkedIn
A post embedding (representation of text in embedding space that effectively captures semantic meaning) is a foundational component of LinkedIn that is consumed by product surfaces in retrieval and ranking (e.g., ranking posts in the feed or video tab). This paper presents the post embeddings used at LinkedIn, where a pre-trained transformer-based large language model (LLM) is taken as input and fine-tuned using multi-task learning across a diverse set of semantic labeling tasks. We observe positive transfer, leading to improved performance across all tasks, compared to training them independently. The generated post embeddings outperform baseline models in zero-shot learning, demonstrating its potential for broader applicability. Furthermore, the generated post embeddings' performance surpasses that of OpenAI's ADA-001 and ADA-002 embeddings on LinkedIn specific datasets and tasks. We also describe the offline evaluation methodology and the deployment to our near-line infrastructure, which makes the post embedding available for use within minutes of post creation for any downstream application. We present how the embeddings were applied in the Feed product surface, in both ranking and retrieval stages, and showcase the real world online impact to demonstrate the superior performance of these embeddings. Finally, we also share the results of applying the embeddings to the retrieval system of our video ranking product surface in LinkedIn. These embeddings have been battle-tested in production at LinkedIn for over two years, consistently powering multiple products.
Towards Explainable Transaction Risk Analysis With Dual Graph Retrieval Augmented Generation
Explainable transaction risk analysis is a challenge for traditional deep learning models, which only predict suspicious transactions without explanations. Current explainable methods rely on hand-crafted rules and lack the ability to automatically generate language-based explanations. Large Language Models (LLMs) offer promise due to their reasoning and text generation abilities but struggle with domain knowledge and hallucinations, making risk analysis difficult. Specifically, LLMs face: (1) insufficient adaptation to transaction data analysis, and (2) ineffective knowledge retrieval methods that ignore the rich graph structure of transaction data. To address these issues, we propose the Dual Graph Retrieval-Augmented Generation (Dual-gRAG) framework, which utilizes dual retrieval: expert knowledge and reasoning case retrieval. Expert knowledge compensates for domain gaps, while reasoning case retrieval provides step-wise analysis guidance. We incorporate both graph-structured features and semantic features into the retrieval process to enhance the effectiveness of the retrieval. Extensive experiments show that Dual-gRAG improves LLMs' risk analysis capabilities, achieving a 15% increase in different metrics.
Learning to Comparison-Shop
In online marketplaces like Airbnb, users frequently engage in comparison shopping before making purchase decisions. Despite the prevalence of this behavior, a significant disconnect persists between mainstream e-commerce search engines and users' comparison needs. Traditional ranking models often evaluate items in isolation, disregarding the context in which users compare multiple items on a search results page. While recent advances in deep learning have sought to improve ranking accuracy, diversity, and fairness by encoding listwise context, the challenge of aligning search rankings with user comparison shopping behavior remains inadequately addressed. In this paper, we propose a novel ranking architecture - Learning-to-Comparison-Shop (LTCS) System - that explicitly models and learns users' comparison shopping behaviors. Through extensive offline and online experiments, we demonstrate that our approach yields statistically significant gains in key business metrics - improving NDCG by 1.7% and boosting booking conversion rate by 0.6% in A/B testing - while also enhancing user experience. We also compare our model against state-of-the-art approaches and demonstrate that LTCS significantly outperforms them.
Locale-Aware Product Type Prediction for E-commerce Search Queries
Search query understanding (QU) is an important building block of the modern e-commerce search engines. QU extracts multiple intents from customer queries, including intended color, brand, etc. One of the most important tasks in QU is predicting which product category the user is interested in. In our work we are tapping into query product type classification (Q2PT) task. Compared to classification of full-fledged texts, Q2PT is more complicated because of the ambiguity of short search queries, which is aggravated by language and cultural differences in worldwide online stores. Moreover, the span and variety of product categories in modern marketplaces pose a significant challenge. We focus on Q2PT inference in the global multi-locale e-commerce markets, which need to deliver high quality user experience in both large and small local stores alike. The common approach of training Q2PT models for each locale separately shows significant performance drops in low-resource stores and prevents from easily expanding to a new country, where the Q2PT model has to be created from scratch. We use transfer learning to address this challenge, augmenting low-resource locales through the vast knowledge of the high-resource ones. We introduce a unified, locale-aware Q2PT model, sharing training data and model structure across worldwide stores. We show that the proposed unified locale-aware Q2PT model has superior performance over the alternatives by conducting extensive quantitative and qualitative analysis on the large-scale multilingual e-commerce dataset across 20 worldwide locales. Our online A/B tests have shown that using locale-aware model improves over the previous user experience, increasing customer satisfaction.
GeoIndia V2: A Unified Graph and Language Model for Context-Aware Geocoding
Geocoding in India presents unique challenges due to the unstructured, multilingual and diverse nature of its address systems. While recent advances in geospatial AI have explored the combination of spatial and semantic cues, existing methods often fall short in effectively integrating both dimensions for robust address resolution. In this work, we propose GeoIndia-V2, an enhanced version of GeoIndia [21], that unifies geospatial and semantic modeling through a novel fusion framework. Our unified model combines the Graphormer architecture [27] and a Pre-trained Transformer based Language Model (PTLM) that is trained from scratch on proprietary Indian address data, using our proposed Key Modulated Cross-Attention (KMCA) mechanism. KMCA enables deep cross-modal interaction between geospatial topology and linguistic structure and allows the model to reason contextually across both geographic and textual dimention, effectively handling the semantic intricacies of Indian addresses-including colloquial usage, inconsistent formatting, and multilinguality. We leverage last-mile e-commerce delivery data to construct a fine-grained graph of neighbourhood connectivity, enabling Graphormer to capture rich spatial relationships. Unlike prior methods that rely on self-loops, we generate graphs dynamically at inference time to exploit Graphormer's topological strength. Additionally, we introduce a generative decoding strategy for predicting hierarchical H3 cells. https://www.uber.com/en-IN/blog/h3/, moving beyond conventional bit-wise classification approaches. To the best of our knowledge, this is the first method to explicitly fuse graph-based geospatial learning with language-driven semantic modeling via cross-attention in the Indian geocoding context. Our approach significantly outperforms existing solutions and marks a substantial advancement toward building scalable real-world geocoding systems for complex address ecosystems like India.
End-to-end Information Extraction from Archival Records with Multimodal Large Language Models
Semi-structured Document Understanding presents a challenging research task due to the significant variations in layout, style, font, and content of documents. This complexity is further amplified when dealing with born-analogue historical documents, such as digitised archival records, which contain degraded print, handwritten annotations, stamps, marginalia and inconsistent formatting resulting from historical production and digitisation processes. Traditional approaches for extracting information from semi-structured documents rely on manual labour, making them costly and inefficient. This is partly due to the fact that within document collections, there are various layout types, each requiring customised optimisation to account for structural differences, which substantially increases the effort needed to achieve consistent quality. The emergence of Multimodal Large Language Models (MLLMs) has significantly advanced Document Understanding by enabling flexible, prompt-based understanding of document images, needless of OCR outputs or layout encodings. Moreover, the encoder-decoder architectures have overcome the limitations of encoder-only models, such as reliance on annotated datasets and fixed input lengths. However, there still remains a gap in effectively applying these models in real-world scenarios. To address this gap, we first introduce BZKOpen, a new annotated dataset designed for key information extraction from historical German index cards. Furthermore, we systematically assess the capabilities of several state-of-the-art MLLMs-including the open-source InternVL2.0 and InternVL2.5 series, and the commercial GPT-4o-mini-on the task of extracting key information from these archival documents. Both zero-shot and few-shot prompting strategies are evaluated across different model configurations to identify the optimal conditions for performance. Interestingly, our results reveal that increasing model size does not necessarily lead to better performance on this dataset. Among all models tested, the open-source InternVL2.5-38B consistently achieves the most robust results, outperforming both larger InternVL models and the proprietary alternative. We further provide practical insights into prompt engineering and inference settings, offering guidance for applying MLLMs to real-world key information extraction tasks. Additionally, we highlight the need for more ground truth datasets that include a wider range of historical documents with varying quality and in multiple languages, in order to fully explore the potentials and limitations of MLLMs for key information extraction from historical records.
DinoCompanion: An Attachment-Theory Informed Multimodal Robot for Emotionally Responsive Child-AI Interaction
Emotional development of children fundamentally relies on secure attachment relationships, yet current AI companions lack the theoretical foundation to provide developmentally appropriate emotional support. We introduce DinoCompanion, the first attachment-theory-grounded multimodal robot for emotionally responsive child-AI interaction. We address three critical challenges in child-AI systems: the absence of developmentally-informed AI architectures, the need to balance engagement with safety, and the lack of standardized evaluation frameworks for attachment-based capabilities. Our contributions include: (i) a multimodal dataset of 128 caregiver-child dyads containing 125,382 annotated clips with paired preference-risk labels, (ii) CARPO (Child-Aware Risk-calibrated Preference Optimization), a novel training objective that maximizes engagement while applying epistemic-uncertainty-weighted risk penalties, and (iii) AttachSecure-Bench, a comprehensive evaluation benchmark covering ten attachment-centric competencies with strong expert consensus. AttachSecure-Bench achieves state-of-the-art performance (57.15%), outperforming GPT-4o and Gemini-2.5-Pro, with exceptional secure base behaviors and superior attachment risk detection. Ablations validate the critical importance of multimodal fusion, uncertainty-aware risk modeling, and hierarchical memory for coherent, emotionally attuned interactions.
SolarMAE: A Unified framework for Regional Centralized and Distributed Solar Power Forecasting with Weather Pre-training
The recent surge in solar plant installations has notably decreased the reliance on fossil fuels while also presenting significant challenges to power grid. Therefore, the accurate forecasting of centralized and distributed solar power has become critically important. Although site-specific forecasting models typically perform better for utility-scale solar power plants, the model maintenance can be troublesome as the number of solar plants grows. Furthermore, the rapid growth and difficulties in real-time data collection associated with distributed solar systems exacerbate the complexity of regional gross solar power forecasting. To address these issues, we propose SolarMAE, a unified regional solar power forecasting framework enabling end-to-end precise forecasting for both centralized and distributed solar systems. It adopts masked autoencoder (MAE) pre-training strategy for numerical weather prediction (NWP) reconstruction at first, aiming to derive spatiotemporal correlations within meteorological variables, and then fine-tunes a temporal convolutional neural network which predicts future solar power generation. Experiments show that this framework outperforms state-of-the-art centralized or distributed solar power forecasting methods in accuracy, and significantly reduces model maintenance cost. It also demonstrates strong few-shot learning capabilities, which is particularly useful for the cold start problem of newly installed solar plants. The unified solar power forecasting system has been deployed in a province in eastern China, serving solar systems with over 73 GW gross installed capacity and more than 400 centralized solar plants.
GCVPN: A Graph Convolutional Visual Prior-Transform Network for Actual Occluded Image Recognition
Image recognition plays a critical role in urban security, traffic management, and environmental monitoring, yet achieving high accuracy in obstructed scenes remains a challenge. To address this, we propose a Graph Convolutional Visual Prior-Transform Network (GCVPN), which significantly improves recognition accuracy and efficiency in complex environments. GCVPN introduces an image prior slicing and topology transformer to convert image data into graph-structured slice features, integrating domain overlap sampling and planar mapping to handle symmetry and enable precise, rapid anomaly detection. By combining a traditional VGG backbone with graph convolutional layers, GCVPN jointly captures topological relationships and feature semantics, while maintaining real-time efficiency with continuous recognition at 30 video frames per second. Extensive experiments demonstrate its effectiveness in photovoltaic panel anomaly detection and face occlusion recognition, highlighting strong potential for applications in intelligent surveillance and autonomous driving.
Fraudulent Delivery Detection with Multimodal Courier Behavior Data in Last-Mile Delivery
The rapid growth of e-commerce has made last-mile delivery a critical service in daily life. Despite regulations mandating doorstep delivery, the pressure of penalties for delays can lead to fraudulent delivery behaviors, where couriers may report package receipt without actually deliver the package to assigned locations. Existing studies on fraud behavior detection focus on exploring user (courier) behaviors for fraud behavior detection. However, due to the inaccuracy of GPS positioning and the variability of user behavior patterns caused by dynamic environmental factors, relying solely on behavior data remains insufficient for detecting fraudulent deliveries. In this paper, we present a Multimodal Fraudulent Delivery Detection framework (MFDD), which integrates heterogeneous data from multiple agents (courier-side and user-side)-including couriers' physical behavior, digital behavior, and conversations containing customer feedback-for detecting fraudulent deliveries in the last-mile delivery. We employ attention mechanisms to extract features from each modality and use cross-modal fusion to capture complex and varied relationships between multimodal data. To further mitigate modality imbalance during training, we introduce a dynamic gradient-modulation strategy that balances learning across all modalities. We implement and evaluate MFDD on real-world, human-annotated data, achieving a 9.6% improvement in precision and a 5.8% increase in accuracy over the state-of-the-art methods. We also deploy the model in the production environment of JD Logistics, and results show that compared to existing methods, MFDD improves accuracy by 15.3%, reducing estimated annual costs by over 18.5 million CNY.
Progressive Semantic Residual Quantization for Multimodal-Joint Interest Modeling in Music Recommendation
In music recommendation systems, multimodal interest learning is pivotal, which allows the model to capture nuanced preferences, including textual elements such as lyrics and various musical attributes such as different instruments and melodies. Recently, methods that incorporate multimodal content features through semantic IDs have achieved promising results. However, existing methods suffer from two critical limitations: 1) intra-modal semantic degradation, where residual-based quantization processes gradually decouple discrete IDs from original content semantics, leading to semantic drift; and 2) inter-modal modeling gaps, where traditional fusion strategies either overlook modal-specific details or fail to capture cross-modal correlations, hindering comprehensive user interest modeling. To address these challenges, we propose a novel multimodal recommendation framework with two stages. In the first stage, our Progressive Semantic Residual Quantization (PSRQ) method generates modal-specific and modal-joint semantic IDs by explicitly preserving the prefix semantic feature. In the second stage, to model multimodal interest of users, a Multi-Codebook Cross-Attention (MCCA) network is designed to enable the model to simultaneously capture modal-specific interests and perceive cross-modal correlations. Extensive experiments on multiple real-world datasets demonstrate that our framework outperforms state-of-the-art baselines. This framework has been deployed on one of China's largest music streaming platforms, and online A/B tests confirm significant improvements in commercial metrics, underscoring its practical value for industrial-scale recommendation systems.
Retrieval-LTV: Fine-Grained Transfer Learning for Lifetime Value Estimation in Large-Scale Industrial Retrieval
In computational advertising, platforms are increasingly optimizing toward advertisers' real assessment metrics to help achieve more reliable advertising performance. Consequently, predicting customers' Lifetime Value (LTV) has become an essential component of the advertising system, as it directly impacts the actual Return On Investment (ROI) of advertisers. Recent research on LTV prediction primarily focuses on the ranking stage, lacking consideration of the initial retrieval stage. This oversight may lead to the inconsistency between retrieval and ranking, resulting in a loss of efficiency. Unlike the LTV estimation in the ranking stage, the retrieval stage faces more severe data sparsity and constraints inherent in online scoring. Incorporating rich data from other domains can mitigate the sparsity while introducing the negative transfer issue. To tackle these challenges, we introduce Retrieval-LTV, a two-tower retrieval model for LTV prediction. This model employs a cooperative framework and incorporates a fine-grained evaluation for each sample across each expert, thereby enhancing effective selective learning from the source domain while mitigating the risk of negative transfer. Additionally, we have designed a specialized representation transformation to obtain the LTV-oriented score for online retrieval. Experiments on three real-world industrial datasets demonstrate that Retrieval-LTV outperforms all the baselines, achieving superior performance. An online A/B test further confirms the effectiveness of Retrieval-LTV, increasing the overall LTV by 2.08%. As a result, Retrieval-LTV has now been fully deployed in Tencent Ads.
You Only Evaluate Once: A Tree-based Rerank Method at Meituan
Reranking plays a crucial role in modern recommender systems by capturing the mutual influences within the list. Due to the inherent challenges of combinatorial search spaces, most methods adopt a two-stage search paradigm: a simple General Search Unit (GSU) efficiently reduces the candidate space, and an Exact Search Unit (ESU) effectively selects the optimal sequence. These methods essentially involve making trade-offs between effectiveness and efficiency, while suffering from a severe inconsistency problem, that is, the GSU often misses high-value lists from ESU. To address this problem, we propose YOLOR, a one-stage reranking method that removes the GSU while retaining only the ESU. Specifically, YOLOR includes: (1) a Tree-based Context Extraction Module (TCEM) that hierarchically aggregates multi-scale contextual features to achieve ''list-level effectiveness'', and (2) a Context Cache Module (CCM) that enables efficient feature reuse across candidate permutations to achieve ''permutation-level efficiency''. Extensive experiments across public and industry datasets validate YOLOR's performance and we have successfully deployed YOLOR on the Meituan food delivery platform.
FinSage: A Multi-aspect RAG System for Financial Filings Question Answering
Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex information retrieval in financial document workflows. However, existing solutions struggle to account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and evolving complexity in financial filings, leading to compromised accuracy in critical information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for data retrieval and summarization in multi-modal financial documents. øurmodel introduces three innovative components: (1) a multi-modal pre-processing pipeline that unifies diverse data formats and generates chunk-level metadata summaries, (2) a multi-path sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct Preference Optimization to prioritize ground-truth-related content. Extensive experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from surpasses the best baseline method on the FinanceBench question answering datasets by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering system in online meetings, where it has already served more than 1,200 people. The implementation is publicly available at https://github.com/simplew4y/finsage.
EduCraft: A System for Generating Pedagogical Lecture Scripts from Long-Context Multimodal Presentations
Educators face substantial workload pressures, with significant time invested in preparing teaching materials. Generating high-quality lecture scripts from multimodal presentations is a particularly demanding aspect of this preparation. This paper introduces EduCraft, a novel system designed to automate Lecture Script Generation (LSG), addressing key difficulties such as comprehensive multimodal understanding, long-context coherence, and instructional design efficacy. EduCraft features a modular architecture comprising: (1) a Multimodal Input Processing pipeline for robust data extraction and association from slides; (2) a core Lecture Script Generation Engine with instruction-guided VLM and Caption+LLM workflows for pedagogical synthesis; (3) an optional Knowledge Augmentation Module using Retrieval-Augmented Generation (RAG) for enhanced factual grounding; and (4) a Model Integration and Deployment Interface supporting diverse AI models and providing a deployable API. Extensive evaluations, including human assessments and a new automated evaluation framework, demonstrate that EduCraft significantly outperforms strong baselines and teacher-refined scripts in producing coherent, readable, and pedagogically sound lecture scripts. By effectively tackling core LSG challenges, EduCraft offers a practical, configurable solution to reduce educator workload and enhance educational content creation. We open-source EduCraft at https://github.com/wyuc/EduCraft.
CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce Markets
As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
Audience-Aware and Self-Adaptive Multi-Interest Modeling for Sharing Rate Prediction in Affiliate Marketing
Affiliate marketing, a component of modern digital marketing, leverages partnerships among merchants, promoters, and consumers to enhance item visibility and drive sales. Promoters act as critical intermediaries, sharing items with their communities to promote items while earning commissions. Accurate prediction of the sharing rate of promoters enables platforms to optimize recommendation performance, thereby improving promotional efficiency. However, existing related methods are mainly designed for consumer-oriented scenarios (C-end), and face significant limitations in modeling the promoters (B-end), which are typically characterized by audience group attachment. Specifically, three core challenges emerge: (1) how to organically integrate audience preferences while maintaining promoter dominance, (2) how to accommodate promoters' diverse interest scopes, and (3) how to capture the complex one-to-many relationships between promoters and their audiences. For Challenge (1), we employ a dynamic routing mechanism based on interest capsules to model the diverse interests of promoters, where audience groups are used to optimize the interest routing via a novel dual-channel attention mechanism, thus allowing audience groups to explicitly participate in the promoter decision-making process with an auxiliary role. For Challenge (2), a parameter-free, confidence-aware interest activation mechanism is introduced to adaptively select sparse interest capsules. For Challenge (3), we pioneer the use of hypergraphs in CTR prediction to model one-to-many relationships between promoters and audiences. Extensive experiments are conducted on two real-world datasets to validate the effectiveness of our approach. Furthermore, the model is deployed on the Alimama platform, which hosts over 100,000 promoters. Online A/B testing results demonstrate that our method achieves a 5.31% average improvement over online baselines.
Dynamic Network-Based Two-Stage Time Series Forecasting for Affiliate Marketing
In recent years, affiliate marketing has emerged as a revenue-sharing strategy where merchants collaborate with promoters to promote their products. It not only increases product exposure but also allows promoters to earn a commission. This paper addresses the pivotal yet under-explored challenge in affiliate marketing: accurately assessing and predicting the contributions of promoters in product promotion. We design a novel metric for evaluating the indirect contributions of the promoter, called propagation scale. Unfortunately, existing time series forecasting techniques fail to deliver accurate predictions due to the propagation scale being influenced by multiple factors and the inherent complexities arising from dynamic scenarios. To address this issue, we decouple the network structure from the node signals and propose a two-stage solution: initially, the basic self-sales and network structure prediction are conducted separately, followed by the synthesis of the propagation scale. Specifically, we design a graph convolution encoding scheme based on descendant neighbors and incorporate hypergraph convolution to efficiently capture complex promotional dynamics. Additionally, three auxiliary tasks are employed: self-sales prediction for base estimations, descendant prediction to synthesize propagation scale, and promoter activation prediction to mitigate high volatility issues. Extensive offline experiments on large-scale industrial datasets validate the superiority of our method. We further deploy our model on Alimama platform with over 100,000 promoters, achieving a 9.29% improvement in GMV and a 5.89% increase in sales volume.
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search
In the dynamic landscape of large-scale web search, Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query, which is essential for improving user engagement and facilitating rapid decision-making. Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications. However, these approaches suffer from two key limitations: 1) The multi-stage pipeline often introduces cumulative information loss and architectural bottlenecks due to its weakest component; 2) Traditional models lack sufficient semantic understanding of both user queries and documents, particularly when dealing with complex search intents. In this study, we propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search. Our approach integrates large model distillation, supervised fine-tuning, direct preference optimization, and lookahead decoding to transform a lightweight model with only 0.1B parameters into a domain-specialized QDTS expert. Evaluated on multiple industry-relevant metrics, our model outperforms the production baseline and achieves a new state of the art. Furthermore, it demonstrates excellent deployment efficiency, requiring only 334 NVIDIA L20 GPUs to handle ~50,000 queries per second under 55~ms average latency per query.
Climber: Toward Efficient Scaling Laws for Large Recommendation Models
Transformer-based generative models have achieved remarkable success across domains with various scaling law manifestations. However, our extensive experiments reveal persistent challenges when applying Transformer to recommendation systems: (1) Transformer scaling is not ideal with increased computational resources, due to structural incompatibilities with recommendation-specific features such as multi-source data heterogeneity; (2) critical online inference latency constraints (tens of milliseconds) that intensify with longer user behavior sequences and growing computational demands. We propose Climber, an efficient recommendation framework comprising two synergistic components: the model architecture for efficient scaling and the co-designed acceleration techniques. Our proposed model adopts two core innovations: (1) multi-scale sequence extraction that achieves a time complexity reduction by a constant factor, enabling more efficient scaling with sequence length; (2) dynamic temperature modulation adapting attention distributions to the multi-scenario and multi-behavior patterns. Complemented by acceleration techniques, Climber achieves a 5.15× throughput gain without performance degradation by adopting a ''single user, multiple item'' batched processing and memory-efficient Key-Value caching. Comprehensive offline experiments on multiple datasets validate that Climber exhibits a more ideal scaling curve. To our knowledge, this is the first publicly documented framework where controlled model scaling drives continuous online metric growth (12.19% overall lift) without prohibitive resource costs. Climber has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms, serving tens of millions of users daily.
HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems
Online display advertising platforms rely on pre-ranking systems to efficiently filter and prioritize candidate ads from large corpora, balancing relevance to users with strict computational constraints. The prevailing two-tower architecture, though highly efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction and coarse similarity metrics, undermining its capacity to model complex user-ad relationships. In this study, we propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model, a new architecture that augments the two-tower paradigm with two key components: generators that pre-generate holistic vectors incorporating coarse-grained user-ad interactions through a dual-generator framework with a cosine-similarity-based generation loss as the training objective, and multi-head representers that project embeddings into multiple latent subspaces to capture fine-grained, multi-faceted user interests and multi-dimensional ad attributes. This design enhances modeling effectiveness without compromising inference efficiency. Extensive experiments on public datasets and large-scale online A/B testing on Tencent's advertising platform demonstrate that HIT significantly outperforms several baselines in relevance metrics, yielding a 1.66% increase in Gross Merchandise Volume and a 1.55% improvement in Return on Investment, alongside similar serving latency to the vanilla two-tower models. The HIT model has been successfully deployed in Tencent's online display advertising system, serving billions of impressions daily. The code is available at https://github.com/HarveyYang123/HIT_model.
CheckDAPR: An MLLM-based Sketch Analysis System for Draw-A-Person-in-the-Rain Assessments
Sketch-based drawing assessments in art therapy are commonly used to understand the cognitive and psychological states of individuals. In conjunction with self-report measures, drawing assessments serve to enhance insights into an individual's psychological state. However, interpreting the drawing assessments is labor-intensive and substantially reliant on the experience of the art therapists. While a few automated approaches for analyzing drawing-based assessments have been proposed to remedy this issue, they mostly rely on existing object detection methods, where complex drawing attributes cannot be accurately decoded. To overcome these challenges, we propose a novel and comprehensive Draw-A-Person-in-the-Rain (DAPR) analysis system, CheckDAPR, which utilizes a Multimodal Large Language Model (MLLM) with object detection methods for in-depth evaluation. Our experimental results show the promising performance of CheckDAPR and its ability to reduce analysis time for art therapists, indicating its potential to aid professionals in art therapy.
DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System
Semantic IDs are discrete identifiers generated by quantizing the Multi-modal Large Language Models embeddings, enabling efficient multi-modal content integration in recommendation systems. However, their lack of collaborative signals results in a misalignment with downstream discriminative and generative recommendation objectives. Recent studies have introduced various alignment mechanisms to address this problem, but their two-stage framework design still leads to two main limitations: (1) inevitable information loss during alignment, and (2) inflexibility in applying adaptive alignment strategies, consequently constraining the mutual information maximization during the alignment process. To address these limitations, we propose a novel and flexible one-stage Dual-Aligned Semantic IDs (DAS) method that simultaneously optimizes quantization and alignment, preserving semantic integrity and alignment quality while avoiding the information loss typically associated with two-stage methods. Meanwhile, DAS achieves more efficient alignment between the semantic IDs and collaborative signals, with the following two innovative and effective approaches: (1) Multi-view Constrative Alignment: To maximize mutual information between semantic IDs and collaborative signals, we first incorporate an ID-based CF debias module, and then design three effective contrastive alignment methods: dual user-to-item (u2i), dual item-to-item/user-to-user (i2i/u2u), and dual co-occurrence item-to-item/user-to-user (i2i/u2u). (2) Dual Learning: By aligning the dual quantizations of users and ads, the constructed semantic IDs for users and ads achieve stronger alignment. Finally, offline experiments and online A/B tests confirm DAS's efficacy, now serving 400M+ users in Kuaishou's ad platform.
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is a fundamental task in recommender systems. The emergence of heterogeneous information, such as user profile and behavior sequences, depicts user interests from different aspects. A mutually beneficial integration of heterogeneous information is the cornerstone towards the success of CTR prediction. However, most of the existing methods suffer from two fundamental limitations, including (1) insufficient inter-mode interaction due to the unidirectional information flow between modes, and (2) aggressive information aggregation caused by early summarization, resulting in excessive information loss. To address these limitations, we propose a novel module named InterFormer to learn heterogeneous information interaction in an interleaving style. To achieve better interaction learning, InterFormer enables bidirectional information flow for mutually beneficial learning across different modes. To avoid aggressive information aggregation, we retain complete information in each data mode and use a separate Cross Arch for effective information selection and summarization. InterFormer has been deployed across multiple platforms at Meta Ads, achieving 0.15% performance gain and 24% QPS gain compared to prior state-of-the-art models and yielding sizable online impact.
Meta-Adaptive Network for Effective Cold-Start Recommendation via Warm-Aware Representation Learning
Click-Through Rate (CTR) prediction models enable users to discover matched items in recommender systems. Industrial-scale models typically adopt a unified embedding approach for both hot and cold items. However, existing embedding-based models exhibit limitations in representation learning for cold items due to sparse historical user interactions. In this paper, we propose a Meta-Adaptive Network for Effective Cold-Start Recommendation (MANE). Inspired by meta-learning, we develop a lightweight plug-and-play meta-learner that generates enhanced representations to model full-lifecycle representations for cold items. Our meta-network dynamically adjusts the contribution of generalized features in final representations as item exposure increases, enabling adaptive balancing between generalization and specificity for cold items. In addition, certain high-potential items in cold-start scenarios face challenges in effective exposure due to limited interaction signals. Therefore, we further propose a novel representation learning method that incorporates a warm-aware contrastive loss, which aligns the representations of cold items with those of hot items exhibiting high multimodal similarity. Experimental results on the Taobao production dataset and online A/B testing validate the effectiveness of our method, achieving 4.34% item page views (IPV) and 2.84% CTR improvement.
Edge-Variational Graph Neural Networks: Harnessing Weak Ties for Enhanced Default Risk Prediction
Default risk prediction (DRP) leveraging financial relational networks (FRNs) has seen extensive application in recent years. Connection confidence within an FRN is pivotal in ensuring the efficacy of FRN-based DRP, with strong connections offering a more substantial predictive impact and weak connections providing supplementary predictive insights. However, in practical DRP implementations, the confidence in both strong and weak connections may be compromised due to fraudulent activities and misinformation caused by data collection biases, necessitating a novel method to effectively assess and calibrate low-confidence connections within the FRN to enhance DRP. To address this challenge, we propose a novel method named Edge-Variational Graph Neural Networks (EVGNN). During the graph encoding phase, EVGNN employs variational inference to assess connection confidence within the FRN, eliminating low-confidence connections and reinforcing weak connections with significant predictive value. In the decoding phase, EVGNN recodes network nodes based on the calibrated FRN structure, yielding refined node representations that bolster DRP. Empirical evaluation based on public datesets and a large-scale FRN dataset from a real-world DRP scenario validates the effectiveness of the proposed method in identifying connection confidence and affirms the validity of utilizing calibrated FRN to enhance DRP.
Augmenting Guest Search Results with Recommendations at Airbnb
Users on Airbnb often perform exhaustive searches with varying conditions to find suitable accommodations. However, overly narrow search criteria can lead to insufficient results, causing frustration and abandonment of search journeys. To address these challenges, we developed flexible pivot recommendations that dynamically augment search results by suggesting alternative dates, relaxing amenity requirements, or adjusting price constraints. These recommendations align with users' broader travel intent, resulting in a measurable improvement in booking rates on the platform. Our solution introduces two key innovations: (1) a modular and extensible architecture to generate the flexible pivot recommendations that integrates seamlessly with Airbnb's existing search ranking system, enabling rapid iteration and minimizing maintenance overhead; and (2) an efficient approach leveraging transfer learning and a Mixture of Experts (MoE) architecture to rank recommendations alongside organic search results. This approach handles diverse scenarios, from single to multiple recommendations, while addressing cold-start challenges and supporting ongoing enhancements. Our solution's scalability and generalizability make it applicable to industries such as online travel agencies and e-commerce platforms, where users benefit from more diverse, intent-aligned recommendations.
Maps Ranking Optimization in Airbnb
Search results on Airbnb are presented in two user interfaces: a list of rectangle cards, referred to as the feed result, that include photos, prices, ratings, and other details of the listings, and listing pins on the map, referred to as the map result, which can either display as price pins or appear without prices as mini-pins. The map plays a key role in Airbnb. Not only does it display the location of the listings in the search results, but it also serves as an interactive user interface that allows users to view the details of the pins and perform new searches by moving or zooming on the map. Majority of searches on Airbnb are conducted using the map, and majority of bookings are related to listings shown in map search results. Limited research has been conducted within the industry to address the unique challenges of maps ranking. For years, it was assumed that showing the top K results based on the model designed for feed ranking was also optimal for the map. However, this assumption simply breaks down when we take a closer look at the NDCG (Normalized Discounted Cumulative Gain) metric. Attention is key to NDCG, and the attention flow on the feed does not apply to the map. In this paper, we will begin with the NDCG theory and redesign map-specific NDCG by introducing three types of attention factors. We conducted a series of experiments to test whether optimizing map NDCG could drive the booking rates on Airbnb, and the results strongly supported this hypothesis.
D3-TR: Data-driven Daily Delivery Task Rescheduling for Cost-effective Last-mile Delivery
In last-mile logistics, couriers are typically assigned fixed zones to perform door-to-door deliveries. In practice, packages in some delivery zones might not be fulfilled on time due to couriers taking irregular leave for sickness or higher-priority task assignments, e.g., services for VIPs and regulatory training. Beyond the costly real-world practice, i.e., hiring temporary workers, analysis of historical data reveals that a daily delivery task rescheduling among on-duty couriers can be a cost-effective and efficient alternative. It involves individual workload assessments and delivery task assignments, both of which existing methods can not address adequately: (i) Existing courier workload assessment methods are not tailored for downstream optimization tasks, leading to poor performance. (ii) Efficiency-oriented task assignment methods may lead to unfair workload among the couriers. To address the above two limitations, in this paper, we propose D3-TR, a data-driven method for task reassignment among present couriers. Firstly, we design a consistency-guided predictor that can quickly and precisely predict the workload of couriers. Secondly, based on this predictor, we design a workload-aware genetic algorithm to solve the optimal task allocation problem. Experimental results underscore the superiority of our method over several baselines. Furthermore, real-world deployment on millions of orders demonstrates the effectiveness of our solution, yielding an average of 3.9% improvement in the on-time delivery rate.
Billion-Scale Graph Deep Learning Framework for Ads Recommendation
In this paper, we systemically disentangle BHG, a graph deep learning framework for daily users' ads recommendations. BHG mainly relies on two pillars: (1) graph tokenization to convert the input temporal heterogeneous graph into sequences of tokens, and (2) graph MLP-Mixer neural architecture to learn node representations on sequences of tokens via a mini-batch manner. In general, BHG embraces three advantages: (1) flexibility, i.e., BHG can be seamlessly integrated with any existing industrial recommendation model by treating the learned node embeddings as additional features that encode interactions, (2) efficiency, i.e., the graph tokenization allows sampling the neighborhood both locally and globally, and reduces the number of nodes considered for aggregations, and (3) model simplicity, i.e., the graph MLP-Mixer does not require self-attention for aggregating nodes and hence enjoys the simplicity. We demonstrate the superior performance of the proposed BHG on two internal datasets and one public dataset. We hope this paper can share insights and explain large-scale graph deep learning deployments for researchers, engineers, and practitioners.
FLAIR: Feedback Learning for Adaptive Information Retrieval
Recent advances in Large Language Models (LLMs) have driven the adoption of copilots in complex technical scenarios, underscoring the growing need for specialized information retrieval solutions. In this paper, we introduce FLAIR, a lightweight, feedback learning framework that adapts copilot systems' retrieval strategies by integrating domain-specific expert feedback. FLAIR operates in two stages: an offline phase obtains indicators from (1) user feedback and (2) questions synthesized from documentation, storing these indicators in a decentralized manner. An online phase then employs a two-track ranking mechanism to combine raw similarity scores with the collected indicators. This iterative setup refines retrieval performance for any query. Extensive real-world evaluations of FLAIR demonstrate significant performance gains on both previously seen and unseen queries, surpassing state-of-the-art approaches. The system has been successfully integrated into Copilot DECO, serving thousands of users at Microsoft, demonstrating its scalability and effectiveness in operational environments.
Towards Unbiased and Real-Time Staytime Prediction for Live Streaming Recommendation
Live streaming has emerged as a dynamic content format that delivers real-time and interactive experiences to users. Distinguished by the short lifespan and immersive nature of live rooms, live streaming poses two key challenges for recommendation: (1) Timeliness: the model must rapidly identify and promote relevant live rooms to target users within a limited window; and (2) Accurate staytime prediction: since extended watching often reflects content quality and user satisfaction, precisely predicting staytime serves as a critical indicator of recommendation relevance and user engagement. Existing approaches often improve timeliness by repeatedly sending staytime signals to accelerate model learning. However, this introduces label truncation bias, distorting the unbiased estimation of high staytime samples. To reconcile these competing demands, we propose MS3M (Multi-Stream Segmented Staytime Modeling), a novel framework that leverages multiple data streams for faster learning while employing segmented staytime modeling-converting staytime regression into a series of time-segmented classification tasks to ensure unbiased training. Furthermore, to address the sparsity of high staytime samples, MS3M's task-dependent architecture allows high staytime parameters to leverage prior knowledge from low staytime data, significantly improving generalization for long-duration watching behaviors. Extensive offline experiments and online A/B tests on TikTok confirm that MS3M effectively balances timeliness and unbiased learning, leading to substantial gains in recommendation accuracy. The proposed approach currently serves TikTok's live streaming recommendation system, contributing to continuous improvement in user watching experience.
Personalized Multi Modal Alignment Encoding for CTR-Recommendation in WeChat
In recent years, with the significant evolution of multi-modal large models, many recommender researchers realized the potential of multi-modal information for user interest modeling. In industry recommendation system, a wide-used modeling architecture is to first pre-train a multi-modal model to provide omnipotent representations and then encode to discrete semantic IDs for online model. Although such a paradigm achieves remarkable improvements, there still exist two problems that limit model performance: (1) Modalities Mapping Independence: Each modal representation is independently mapped to semantic spaces and then get the specific code, which ignores the consistency and complementarity of different modalities of the same item. (2) User-irrelevant Clustering Assignment: For the specific item, most of existing quantization methods assume that all users share the same cluster assignments, failing to account for the varying interpretations and emotional responses users may have toward an item. To address these challenges, we propose a Personalized Multi Modal Alignment Encoding for CTR-Recommendation in WeChat (PMMAE for short). First, we design a multi modal contrastive alignment module to ensure the consistency of various modalities encoding. We then fuse them to form a consistent and comprehensive semantic label. Second, a meta network fed with users' interest embeddings is learned to generate personalized functions to achieve personalized clustering assignment for each user. Benefiting from the meta-generated personalized assignment function, we can take full account of the variability in users' understanding of items.Extensive experimental results demonstrate that our model PMMAE significantly outperforms baseline models on both offline performance and online A/B tests in WeChat recommendation scenario. Our model has been deployed on WeChat's various services, serving hundreds of millions of users daily.
RankMixer: Scaling Up Ranking Models in Industrial Recommenders
Recent progress on large language models (LLMs) has spurred interest in scaling up recommendation systems, yet two practical obstacles remain. First, training and serving cost on industrial Recommenders must respect strict latency bounds and high QPS demands. Second, most human-designed feature-crossing modules in ranking models were inherited from the CPU era and fail to exploit modern GPUs, resulting in low Model Flops Utilization (MFU) and poor scalability. We introduce RankMixer, a hardware-aware model design tailored towards a unified and scalable feature-interaction architecture. RankMixer retains the transformer's high parallelism while replacing quadratic self-attention with multi-head token mixing module for higher efficiency. Besides, RankMixer maintains both the modeling for distinct feature subspaces and cross-feature-space interactions with Per-token FFNs. We further extend it to one billion parameters with a Sparse-MoE variant for higher ROI. A dynamic routing strategy is adapted to address the inadequacy and imbalance of experts training. Experiments show RankMixer's superior scaling abilities on a trillion-scale production dataset. By replacing previously diverse handcrafted low-MFU modules with RankMixer, we boost the model MFU from 4.5% to 45%, and scale our online ranking model parameters by two orders of magnitude while maintaining roughly the same inference latency. We verify RankMixer's universality with online A/B tests across two core application scenarios (Recommendation and Advertisement). Finally, we launch 1B Dense-Parameters RankMixer for full traffic serving without increasing the serving cost, which improves user active days by 0.3% and total in-app usage duration by 1.08%.
SESSION: Resource Papers
IARD: Intruder Activity Recognition Dataset for Threat Detection
Home security and surveillance systems are rapidly evolving, with Artificial Intelligence (AI) playing a transformative role in enhancing safety and threat detection. While several AI methods and datasets for intruder-related risk assessment exist, they predominantly focus on face detection and recognition, leaving a significant gap in addressing high-risk scenarios involving malicious intent, such as theft or harm. The lack of dedicated datasets for recognizing complex intruder activities, such as carrying weapons or engaging in destructive actions like kicking doors or breaking locks, limits the development of robust solutions. This work bridges this gap by introducing the Intruder Activity Recognition Dataset (IARD), a video dataset specifically designed to recognize four critical intruder activities: Armed Intruder, Door Kick, Intruder Inside and Lock Breaking. Leveraging IARD, we thoroughly benchmark various state-of-the-art methods, among which a Vision Transformer is found to achieve an impressive 93.3% accuracy in recognizing intruder actions. Our contribution highlights the potential of IARD in advancing AI-driven surveillance systems, providing a foundational dataset and benchmark for recognizing complex intruder activities.
ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional clinical access and pervasive stigma. Social media platforms, and Reddit in particular, offer rich, user-generated narratives that can reveal early signs of depressive symptomatology. However, existing computational approaches often label entire posts simply as depressed or not depressed, without linking language to specific criteria from the DSM-5, the standard clinical framework for diagnosing depression. This limits both clinical relevance and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms. For each label, the annotator also provides a concise clinical rationale grounded in DSM-5 methodology. We conduct an exploratory analysis of the collection, examining lexical, syntactic, and emotional patterns that characterize symptom expression in social media narratives. Compared to prior resources, ReDSM5 uniquely combines symptom-specific supervision with expert explanations, facilitating the development of models that not only detect depression but also generate human-interpretable reasoning. We establish baseline benchmarks for both multi-label symptom classification and explanation generation, providing reference results for future research on detection and interpretability.
hopwise: A Python Library for Explainable Recommendation based on Path Reasoning over Knowledge Graphs
Explainability is becoming central to the development of responsible recommender systems, especially as path reasoning over knowledge graphs saw increased adoption for extracting structured, semantic user-item connections. However, reproducible research in such field remains limited due to fragmented implementations, missing utilities, and the lack of standardized evaluation pipelines. In this paper, we propose hopwise, an open-source library that supports the full life-cycle of explainable path reasoning recommendation methods over knowledge graphs, from knowledge graph preparation to explanation path delivery and evaluation. Rather than creating a new library from scratch, hopwise builds upon the modular and widely adopted RecBole ecosystem, enriching it with more knowledge graphs, path sampling utilities, path reasoning methods, and metrics for evaluating explanation path utility, coverage, and diversity. We show the framework's utility by means of a benchmark including two knowledge graphs and several recommendation methods. Code and Data: https://github.com/tail-unica/hopwise.
PyLate: Flexible Training and Retrieval for Late Interaction Models
Neural ranking has become a cornerstone of modern information retrieval. While single vector search remains the dominant paradigm, it suffers from the shortcoming of compressing all the information into a single vector. This compression leads to notable performance degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving individual token embeddings and computing similarity via the MaxSim operator. This architecture has demonstrated superior empirical advantages, including enhanced out-of-domain generalization, long-context handling, and performance in complex retrieval scenarios. Despite these compelling empirical results and clear theoretical advantages, the practical adoption and public availability of late interaction models remain low compared to their single-vector counterparts, primarily due to a lack of accessible and modular tools for training and experimenting with such models. To bridge this gap, we introduce PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector architectures natively, inheriting its efficient training, advanced logging, and automated model card generation while requiring minimal code changes to code templates users are already familiar with. By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models, thereby unlocking their full potential in modern IR systems. Finally, PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.
VideoAVE: A Multi-Attribute Video-to-Text Attribute Value Extraction Dataset and Benchmark Models
Attribute Value Extraction (AVE) is important for structuring product information in e-commerce. However, existing AVE datasets are primarily limited to text-to-text or image-to-text settings, lacking support for product videos, diverse attribute coverage, and public availability. To address these gaps, we introduce VideoAVE, the first publicly available video-to-text e-commerce AVE dataset across 14 different domains and covering 172 unique attributes. To ensure data quality, we propose a post-hoc CLIP-based Mixture of Experts filtering system (CLIP-MoE) to remove the mismatched video-product pairs, resulting in a refined dataset of 224k training data and 25k evaluation data. In order to evaluate the usability of the dataset, we further establish a comprehensive benchmark by evaluating several state-of-the-art video vision language models (VLMs) under both attribute-conditioned value prediction and open attribute-value pair extraction tasks. Our results analysis reveals that video-to-text AVE remains a challenging problem, particularly in open settings, and there is still room for developing more advanced VLMs capable of leveraging effective temporal information. The dataset and benchmark code for VideoAVE are available at: https://github.com/gjiaying/VideoAVE.
ERASURE: A Modular and Extensible Framework for Machine Unlearning
Machine Unlearning (MU) is an emerging research area that enables models to selectively forget specific data, a critical requirement for privacy compliance (e.g., GDPR, CCPA) and security. However, the lack of standardized benchmarks makes evaluating and developing unlearning methods difficult. To address this gap, we introduce ERASURE, a benchmarking and development framework designed to systematically assess MU techniques. ERASURE provides a modular, extensible, open-source environment with real-world datasets and standardized unlearning measures. The framework is designed with configuration-driven workflows and an inversion of control architecture, allowing integration of new datasets, models, and evaluation measures. ERASURE advances trustworthy AI research as a tool for researchers to develop and benchmark new MU methods.
YTCommentVerse: A Multi-Category Multi-Lingual YouTube Comment Corpus
In this paper, we introduce YTCommentVerse, a large-scale multilingual and multi-category dataset of YouTube comments. It contains over 32 million comments from 178,000 videos contributed by more than 20 million unique users spanning 15 distinct YouTube content categories such as Music, News, Education and Entertainment. Each comment in the dataset includes video and comment IDs, user channel details, upvotes and category labels. With comments in over 50 languages, YTCommentVerse provides a rich resource for exploring sentiment, toxicity and engagement patterns across diverse cultural and topical contexts. This dataset helps fill a major gap in publicly available social media datasets particularly for analyzing video sharing platforms by combining multiple languages, detailed categories and other metadata.
Internet of Things Dataset for Human Operator Activity Recognition in Industrial Environment
In industrial environments, most production-related activities performed by human operators are often complex. Accurate detections of these activities are pivotal as it can greatly help to assess productivity that can lead to improvement in worker training, as well as in other scenarios ensure a safe work environment and reducing injuries. Existing datasets on wearable Internet of Things (IoT) for human activity recognition primarily focuses on general activities, such as walking, running, etc., and therefore, related machine learning models and datasets are not suitable for application to industrial environments. In this paper, we present a novel dataset for classifying human operator activities in a meat processing plant where production line operators use knives to cut, process and produce meat products. Our dataset contains human operator activity data captured using wearable IoT sensors collected from a meat processing production facility. Through extensive experiments using machine and deep learning, we demonstrate that our dataset is effective and useful for detecting different activities of a human operator working in an industrial environment. To the best of our knowledge, this is the only real-world IoT dataset that will be made publicly available to support further research into industrial activities recognition. Our dataset and related experiments are available at https://digitalinnovationlab.github.io/mppdataset.
Portuguese post-OCR Resources for Text Optimisation
Optical Character Recognition (OCR) systems are designed to extract text from images. While typically optimised for modern documents, they often struggle when applied to historical documents due to older fonts, complex layouts, and physical degradation, which can result in noisy outputs. To reduce OCR errors, post-OCR algorithms are commonly used, however, their development and evaluation require image-transcription pairs. Compared to other European languages, there is a lack of transcribed documents for historical Portuguese, especially for texts predating the 19th century. To address this gap, we introduce Portuguese post-OCR Resources for Text Optimisation (PORTO), a dataset that spans from the 17th to the 20th centuries. PORTO contains 3,782 image-transcription pairs, along with OCR outputs from four different systems, providing a valuable resource for the development and evaluation of OCR and post-OCR methods tailored to historical Portuguese.
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method
Climate science studies the structure and dynamics of Earth's climate system and seeks to understand how climate changes over time, where the data is usually stored in the format of time series, recording the climate features, geolocation, time attributes, etc. Recently, much research attention has been paid to the climate benchmarks. In addition to the most common task of weather forecasting, several pioneering benchmark works are proposed for extending the modality, such as domain-specific applications like tropical cyclone intensity prediction and flash flood damage estimation, or climate statement and confidence level in the format of natural language. To further motivate the artificial intelligence development for climate science, in this paper, we first contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns (1) the time series climate data from ERA5, (2) extreme weather events data from NOAA, and (3) satellite image data from NASA HLS based on a unified spatial-temporal granularity. Second, under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks in the proposed ClimateBench-M. The data and code of ClimateBench-M are publicly available at https://github.com/iDEA-iSAIL-Lab-UIUC/ClimateBench-M.
FediData: A Comprehensive Multi-Modal Fediverse Dataset from Mastodon
Recently, decentralized online social networks (DOSNs) such as Mastodon have emerged quickly, bringing new opportunities for studies in user behavior modeling and multi-modal learning. However, their decentralized architecture presents two key challenges: 1) Distributed data and inconsistent access strategies across several individual instances make a unified collection difficult; 2) user-generated content (UGC) contains multiple modalities while lacking standard organization and high-quality annotation. To address these issues, we constructed FediData, a comprehensive multi-modal dataset from Mastodon. Our dataset integrates user profiles, text, images, and social interactions. To validate FediData's usefulness, we designed and analyzed several tasks and systematically evaluated the performance of existing state-of-the-art methods. Our analysis reveals the unique challenges of DOSNs and highlights the value of FediData in DOSN-related studies. We believe FediData could serve as a foundational dataset for advancing user behavior analytics, multi-modal learning, and future decentralized web research. All data and documentation are available in a Zenodo repository at https://zenodo.org/records/15621243 (DOI: 10.5281/zenodo.15621243).
STM-Graph: A Python Framework for Spatio-Temporal Mapping and Graph Neural Network Predictions
Urban spatio-temporal data present unique challenges for predictive analytics due to their dynamic and complex nature. We introduce STM-Graph, an open-source Python framework that transforms raw spatio-temporal urban event data into graph representations suitable for Graph Neural Network (GNN) training and prediction. STM-Graph integrates diverse spatial mapping methods, urban features from OpenStreetMap, multiple GNN models, comprehensive visualization tools, and a graphical user interface (GUI) suitable for professional and non-professional users. This modular and extensible framework facilitates rapid experimentation and benchmarking. It allows integration of new mapping methods and custom models, making it a valuable resource for researchers and practitioners in urban computing. The source code of the framework and GUI are available at: https://github.com/Ahghaffari/stm_graph and https://github.com/tuminguyen/stm_graph_gui.
E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on High-impact Legal Cases
The way the media report on legal cases can significantly shape public opinion, often embedding subtle biases that influence societal views on justice, fairness, and morality. Analyzing these narratives requires a holistic approach that captures their emotional tone, moral framing, and the specific events they convey. In this work, we introduce E2MoCase, a novel dataset that enables integrated analysis of emotions, morality, and events within legal narratives and media coverage. We leverage NLP models to extract events and predict morality and emotions, providing a multidimensional perspective on how legal cases are portrayed in news articles. Our experimental evaluation showed that E2MoCase is beneficial for addressing emotion- and morality-based tasks, which is also confirmed by a human evaluation of the annotations.
A Large-Scale Web Search Dataset for Federated Online Learning to Rank
The centralized collection of search interaction logs for training ranking models raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers a privacy-preserving alternative by enabling collaborative model training without sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning of classical learning-to-rank datasets, simulated user clicks, and the assumption of synchronous client participation. This oversimplifies real-world dynamics and undermines the realism of experimental results. We present AOL4FOLTR, a large-scale web search dataset with ≈ 2.6 million queries from 10,000 users. Our dataset addresses key limitations of existing benchmarks by including user identifiers, real click data, and query timestamps, enabling realistic user partitioning, behavior modeling, and asynchronous federated learning scenarios.
A Large-Scale Dataset of Interactions Between Weibo Users and Platform-Empowered LLM Agent
We release a large-scale dataset that captures interactions between human users and CommentRobert, an LLM-based social media agent on Weibo. The dataset contains Weibo posts in which users actively mention the LLM agent account @CommentRobert, indicating that the users are interested in interacting with the platform-empowered LLM agent. The dataset contains 557,645 interactions from 304,400 unique users over 17 months. We detail our data collection methodology, user attributes, and content characteristics, underscoring the dataset's value in examining real-world human-LLM agent interactions. Our analysis offers insights into the demographic and behavioral traits of users interested in the selected LLM agent, interaction dynamics between humans and the agent, and linguistic patterns in comments. These interactions provide a unique lens through which to explore how humans perceive, trust, and communicate with LLMs. This dataset enables further research into modeling human intent understanding, improving LLM agent design, and studying the evolution of human-LLM agent relationships. Potential applications also include long-term user engagement prediction and AI-generated comment detection on social platforms. This constructed dataset is available at https://zenodo.org/records/16921462.
PersonaGen: A Persona-Driven Open-Ended Machine-Generated Text Dataset
We present PersonaGen, a novel dataset for investigating persona-driven machine-generated text (MGT) produced by Open Large Language Models (OLLMS). PersonaGen is specifically designed to investigate how synthetic persona profiles affect, guide, or manifest in MGT. We built PersonaGen by pairing curated persona-profiles (i.e., description of characteristics, background, and goals) across eight thematic domains (e.g., Physics, Education, Medicine) with prompts covering various narrative or opinion-style content (e.g., stories, commonsense). Open-ended generations were produced by six representative OLLMs, yielding a total of 1.44 million persona-driven generations. PersonaGen supports multiple research tasks, such as machine-generated text attribution, persona category detection, and persona profile identification, thus providing a valuable resource for studying LLM controllability and role-playing behavior, as well as the impact of persona profile conditioning in downstream tasks. We have released PersonaGen on the Hugging Face platform at https://doi.org/10.57967/hf/5805.
Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services
As interest in using Large Language Models for interactive and emotionally rich experiences grows, virtual pet companionship emerges as a novel yet underexplored application. Existing approaches focus on basic pet role-playing interactions without systematically benchmarking LLMs for comprehensive companionship. In this paper, we introduce PET-BENCH, a dedicated benchmark that evaluates LLMs across both self-interaction and human-interaction dimensions. Unlike prior work, PET-BENCH emphasizes self-evolution and developmental behaviors alongside interactive engagement, offering a more realistic reflection of pet companionship. It features diverse tasks such as intelligent scheduling, memory-based dialogues, and psychological conversations, with over 7,500 interaction instances designed to simulate pet behaviors. Evaluation of 28 LLMs reveals significant performance variations linked to model size and inherent capabilities, underscoring the need for specialized optimization in this domain. PET-BENCH serves as a foundational resource for benchmarking pet-related LLM abilities and advancing emotionally immersive human-pet interactions.
EFT-LR: Benchmarking Learning Rate Policies in Parameter-Efficient Large Language Model Fine-tuning
Large Language Models (LLMs) have achieved extensive impacts across various real-world data mining applications. Given the extremely high cost of training or fine-tuning LLMs, parameter-efficient fine-tuning (e.g., LoRA) has emerged as a popular and practical approach for adapting pre-trained general-purpose LLMs to specific downstream tasks. Among the various hyperparameters involved in parameter-efficient fine-tuning of LLMs, the learning rate (LR) plays a crucial role in determining the overall performance. However, it lacks a systematic benchmark framework to explore and understand how different LR policies influence the effectiveness of parameter-efficient LLM fine-tuning, which makes it challenging to select an optimal LR policy. To address this critical research gap, this paper introduces a systematic benchmark, EFT-LR, for assessing and selecting LR policies for effective parameter-efficient fine-tuning of LLMs. We first present a collection of seven popular LR policies spanning three major categories in the literature. We then perform parameter-efficient fine-tuning of LLMs using these LR policies and assess fine-tuned LLMs on eight downstream tasks. Our empirical analysis using EFT-LR provides an in-depth investigation of the impacts of different LR policies on parameter-efficient LLM fine-tuning, offering practical guidelines for practitioners. We provide the source code at https://github.com/mlsysx/EFT-LR.
The Yelp Collaborative Knowledge Graph
Yelp Open Dataset (YOD) is a widely used dataset for Recommender Systems (RS). Multiple Knowledge Graphs (KGs) have been built for YOD, but they have various issues: the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs, do not link to existing vocabularies, ignore important data, and are generally of small size. Instead, we present the Yelp Collaborative Knowledge Graph (YCKG), where we correctly integrating taxonomies, product categories, business locations, and the Yelp social network, through common practices within the semantic web community, overcoming all these issues. As a result, the YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over 244 million triples, 144 distinct predicates, for about 72 million resources, with an average in-degree and out-degree of 3.3 and 12.2, respectively. Further, we release both the data and the code used to generate the KG for inspection and further extensions. This dataset can be used to develop and test both recommendation and data-mining algorithms able to exploit rich and semantically meaningful knowledge. We publicize the code for the CKG construction on: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph.
Generative Recommendation with Semantic IDs: A Practitioner's Handbook
Generative recommendation (GR) has gained increasing attention for its promising performance compared to traditional models. A key factor contributing to the success of GR is the semantic ID (SID), which converts continuous semantic representations (e.g., from large language models) into discrete ID sequences. However, varied modeling techniques, hyper-parameters, and experimental setups in existing literature make direct comparisons between GR proposals challenging. Furthermore, the absence of an open-source, unified framework hinders systematic benchmarking and extension, slowing model iteration. To address this challenge, our work introduces and open-sources a framework for Generative Recommendation with semantic ID, namely GRID, specifically designed for modularity to facilitate easy component swapping and accelerate idea iteration. Using GRID, we systematically experiment with and ablate different components of GR models with SIDs on public benchmarks. Our comprehensive experiments with GRID reveal that many overlooked architectural components in GR models with SIDs substantially impact performance. This offers both novel insights and validates the utility of an open-source platform for robust benchmarking and GR research advancement. GRID is open-sourced at https://github.com/snap-research/GRID.
CausalBench-ER: Causally-Informed Explanations and Recommendations for Reproducible Benchmarking
Due to the critical role causality plays in decision-making, the state of-the-art in machine learning for causality is rapidly evolving. With rapid development and deployment of new models, datasets, and metrics, it is increasingly difficult for researchers and practitioners to identify the most suitable approach for their problem. Models exhibit different performances when they train on different data or even when they are used under different hardware/software platforms, making it challenging for users to select the appropriate setup pertinent to their problem. To address these difficulties, we present a computing framework, CausalBench-ER that serves, not only as a benchmarking platform for causal machine learning models, but also as a resource that can explain benchmarking results across different metrics, software, and hardware setups. Furthermore, CausalBench-ER recommends additional scenarios to consider to help pave the way towards more robust benchmarking.
Datasets for Supervised Adversarial Attacks on Neural Rankers
We introduce a novel dataset for adversarial rank attacks against neural rankers, enabling systematic research on robustness. Unlike prior unsupervised or surrogate-based methods, our approach uses Retrieval-Augmented Generation (RAG) with a Large Language Model (LLM) to create high-quality adversarial examples that subtly alter rankings while maintaining coherence and relevance. Built via a self-refining LLM-Ranker feedback loop, the dataset includes two tiers: Gold and Diamond, based on attack strength, along with rich metadata, ranking labels, and quality metrics. Released with code and prompts, it supports training, evaluation, and benchmarking of robust ranking systems.
RuSemCor: A Word Sense Disambiguation corpus for Russian
We present RuSemCor, an open Word Sense Disambiguation (WSD) corpus for Russian. The corpus was constructed by manually linking tokens from the OpenCorpora corpus to senses in the Russian wordnet RuWordNet. It consists of 869 documents with 121,710 tokens of which 51,588 are wordnet annotated. The resource is represented using the NIF, OLiA, OntoLex, and Global WordNet ontologies and integrated into the Linguistic Linked Open Data cloud. We used RuSemCor as a diagnostic benchmark to evaluate a range of WSD methods. Our experiments yielded three main findings. 1)~Generative LLMs substantially outperform traditional knowledge-based methods such as Personalized PageRank. 2)~Despite their strengths, generative LLMs do not surpass encoder-based models specifically trained for WSD. 3)~Incorporating lexical-semantic relations from RuWordNet produces mixed results: it enhances the performance of encoder-based models and leading LLMs like GPT-4, DeepSeek, and Mistral 24B, but tends to degrade accuracy for smaller generative models such as GPT-3 and Mistral 7B. The resource is distributed under the CC BY-SA open license and is available at: https://github.com/LLOD-Ru/rusemcor.
NLP-QA: A Large-scale Benchmark for Informative Question Answering over Natural Language Processing Documents
The exponential growth of research literature across AI domains necessitates efficient information extraction via Question Answering (QA). However, scholarly QA development is hindered by the scarcity of large-scale, expertly-annotated datasets, that are needed for modern deep learning models. To address this gap and advance scholarly QA, we introduce NLP-QA, a new dataset of question-answer pairs derived from NLP research documents. We overcome the challenge of costly expert annotation by proposing a novel, automated construction method that leverages content from conference presentation slides. We create two versions of our dataset - one by extracting QA pairs from individual slides (NLP-QA-SS) and the other by extracting QA pairs from the collection of slides for a paper as a whole (NLP-QA-MS). We benchmark several Large Language Models (LLMs) on NLP-QA in zero-shot settings, with and without finetuning, to establish performance baselines, demonstrating the challenging nature as well as the utility of the dataset. We demonstrate the challenging nature of our dataset in zero-shot long-context reasoning of LLMs without additional finetuning. We show that there is a significant jump in the LLMs' performance after finetuning with NLP-QA. The dataset and code are publicly available at https://github.com/AvishekLahiri/NLP-QA.git
From Rules to Flexibility: A Resource and Method for SEC Item Extraction in Post-2021 10-K Filings
10-K filings represent a significant repository for financial text analysis, encompassing both standardized quantitative indicators and rich unstructured text content. In recent years, the efficacy of rule-based extraction methods has been progressively limited due to changes in the 10-K filing format. In this study, we propose a novel layout-robust segmentation approach that achieves identification of financial report by combining fuzzy matching and structural heuristics. Our approach has been employed in recent 10-K filings (2021-2024), resulting in a standardized dataset with item-level segmentation. Furthermore, an automated validation protocol was developed in order to assess coverage and ranking consistency. Analysis of the protocol indicates that our approach achieves an average extraction accuracy of 87.8%. Finally, a case study utilising Item 1A to forecast short-term stock volatility provides a practical demonstration of the application of the corpus. This case study not only serves to validate the corpus but also showcases its compatibility with EDGAR-CORPUS. Code, benchmarks, segmented 10-K filings, and case studies are publicly available on GitHub. Our GitHub repository: https://github.com/johnny-xiao-li/Flex_10K
A Comprehensive Toolkit for Generalized Robust Vision
While deep neural networks (DNNs) excel in computer vision tasks, their real-world deployment is hindered by robustness limitations compared to human perception. Adversarial attacks and data distribution shifts remain critical vulnerabilities, degrading model performance under practical conditions. To address these challenges and advance robustness research, we introduce a comprehensive, user-friendly toolkit for training, evaluating, and analyzing robust vision models. It targets two key dimensions of robustness: 1) Adversarial robustness-defending against malicious worst-case perturbations (adversarial examples); 2) Natural robustness-maintaining performance under real-world corruptions and distribution shifts. Through extensive image classification benchmarks, our toolkit enables precise model assessment. We envision this toolkit accelerating the development of practically robust models and bridging the gap between machine and human vision capabilities.
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph
Large language models (LLMs) have demonstrated their capabilities across various natural language processing (NLP) tasks. Their potential in e-commerce is also substantial, evidenced by existing implementations in scenarios such as platform search and recommender systems. One obstinate concern associated with LLMs is the factuality issue (e.g., hallucination), which is urgent in e-commerce due to its significant impact on user experience and revenue. While some methods aim to evaluate the factuality of LLMs, issues such as lack of objectivity, high consumption, and lack of domain expertise arise. To this end, leveraging a collected knowledge graph (KG) as a reliable source, we propose ECKGBench, a question-answering dataset to assess LLMs' capacity in e-commerce. Specifically, each question is automatically generated based on one KG triple through a standardized pipeline, guaranteeing evaluation quality and reliability. We evaluate advanced LLMs using ECKGBench and provide insights into experimental results. The dataset is available online at~ https://github.com/OpenStellarTeam/ECKGBench.
GCondenser: Benchmarking Graph Condensation
Large-scale graphs are valuable for graph representation learning, but the vast volume of data often hinders model building efficiency. Graph condensation (GC) addresses this challenge by compressing a large graph into a significantly smaller one that still supports effective model training. While recent studies have proposed various techniques to enhance condensation effectiveness, comprehensive and practical evaluations of these methods remain limited. In this paper, we introduce GCondenser, a large-scale graph condensation toolkit designed to facilitate flexible development, holistic evaluation and comparison of mainstream GC approaches. GCondenser provides a standardised GC pipeline with condensation, validation, and evaluation stages, and offers straightforward extensibility to accommodate new methods and datasets. Additionally, we conduct a thorough empirical study of existing GC methods, offering insights into multiple facets of condensation performance. The toolkit is available at https://github.com/superallen13/GCondenser.
CSMD: Curated Multimodal Dataset for Chinese Stock Analysis
The stock market is a complex and dynamic system, where it is non-trivial for researchers and practitioners to uncover underlying patterns and forecast stock movements. The existing studies for stock market analysis rely on leveraging various types of information to extract useful factors, which are highly conditional on the quality of the data used. However, the currently available resources are mainly based on the U.S. stock market in English, which is inapplicable to adapt to other countries. To address these issues, we propose CSMD, a multimodal dataset curated specifically for analyzing the Chinese stock market with meticulous processing for validated quality. In addition, we develop a lightweight and user-friendly framework LightQuant for researchers and practitioners with expertise in financial domains. Experimental results on top of our datasets and framework with various backbone models demonstrate their effectiveness compared with using existing datasets. The datasets and code are publicly available at the link: https://github.com/ECNU-CILAB/LightQuant.
Multimodal Banking Dataset: Understanding Client Needs through Event Sequences
Financial organizations collect a huge amount of temporal (sequential) data about clients, which is typically collected from multiple sources (modalities). Despite the urgent practical need, developing deep learning techniques suitable to handle such data is limited by the absence of large open-source multi-source real-world datasets of event sequences. To fill this gap, which is mainly caused by security reasons, we present the first industrial-scale publicly available multimodal banking dataset, MBD, that contains information on more than 2M corporate clients of a large bank. Clients are represented by several data sources: 950M bank transactions, 1B geo position events, 5M embeddings of dialogues with technical support, and monthly aggregated purchases of four bank products. All entries are properly anonymized from real proprietary bank data, and the experiments confirm that our anonymization still saves all significant information for introduced downstream tasks. MBD enables supports campaigning task (predict future customer purchases). We provide numerical results for the state-of-the-art event sequence modeling techniques demonstrate the superiority of fusion baselines over single-modal techniques for this task. HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD Github Link: https://github.com/Dzhambo/MBD
SparseKmeans: Efficient K-means Clustering For Sparse Data
We introduce SparseKmeans, the first Python package for fast K-means clustering on high-dimensional sparse data. Most existing K-means implementations, such as scikit-learn, are only optimized for dense data and do not run efficiently on sparse inputs. In this work, we thoroughly investigate how to accelerate widely used K-means algorithms on sparse data via matrix operations. In particular, we propose a new design of Elkan's method that aggregates distance computations and reduces fragmented memory access. By analyzing the structure of key matrices and leveraging highly optimized sparse matrix libraries, SparseKmeans achieves up to 9x speedup over scikit-learn. The package is available at https://github.com/cjlin1/sparsekmeans.
A Content-Driven Micro-Video Recommendation Dataset at Scale
Micro-form videos have emerged as a popular form of content, leading to extensive research in micro-video recommendation with significant implications for the entertainment, advertising, and e-commerce industries. However, the lack of publicly available large-scale micro-video datasets presents a challenge for developing effective recommender systems. To address this challenge, we introduce a comprehensive and diverse micro-video recommendation dataset, referred to as ''MicroLens.'' This dataset comprises nine million user-item interaction behaviors, one million users, and 91 thousand full-length micro-videos. It includes rich modality information such as titles, cover images, and audio associated with the videos. MicroLens serves as a benchmark for the content-driven micro-video recommendation, allowing researchers to leverage diverse video modality information, particularly the raw video features, to enhance the effectiveness of recommender systems. This goes beyond the traditional reliance on item IDs or off-the-shelf pre-extracted video/visual features, providing new avenues for improving recommendation accuracy and personalization. We have conducted extensive experiments on MicroLens, benchmarking multiple recommender models and video encoders, which have provided valuable insights into the performance of micro-video recommendation. We anticipate that this dataset will not only benefit the recommender system community but also foster advancements in the field of video understanding. Our datasets, code, and additional documents are available at https://github.com/westlake-repl/MicroLens.
S2Cap: A Benchmark and a Baseline for Singing Style Captioning
Singing voices contain much richer information than common voices, including varied vocal and acoustic properties. However, current open-source audio-text datasets for singing voices capture only a narrow range of attributes and lack acoustic features, leading to limited utility towards downstream tasks, such as style captioning. To fill this gap, we formally define the singing style captioning task and present S2Cap, a dataset of singing voices with detailed descriptions covering diverse vocal, acoustic, and demographic characteristics. Using this dataset, we develop an efficient and straightforward baseline algorithm for singing style captioning. The dataset is available at https://zenodo.org/records/15673764.
Revisiting Pre-processing Group Fairness: A Modular Benchmarking Framework
As machine learning systems become increasingly integrated into high-stakes decision-making processes, ensuring fairness in algorithmic outcomes has become a critical concern. Methods to mitigate bias typically fall into three categories: pre-processing, in-processing, and post-processing. While significant attention has been devoted to the latter two, pre-processing methods, which operate at the data level and offer advantages such as model-agnosticism and improved privacy compliance, have received comparatively less focus and lack standardised evaluation tools. In this work, we introduce FairPrep, an extensible and modular benchmarking framework designed to evaluate fairness-aware pre-processing techniques on tabular datasets. Built on the AIF360 platform, FairPrep allows seamless integration of datasets, fairness interventions, and predictive models. It features a batch-processing interface that enables efficient experimentation and automatic reporting of fairness and utility metrics. By offering standardised pipelines and supporting reproducible evaluations, FairPrep fills a critical gap in the fairness benchmarking landscape and provides a practical foundation for advancing data-level fairness research.
QueryBridge: One Million Annotated Questions with SPARQL Queries - Dataset for Question Answering over Knowledge Graphs
Question answering over knowledge graphs (QAKG) involves interpreting natural language questions and linking them to structured knowledge graphs. Existing benchmark datasets (e.g., QALD, LC-QuAD) are limited in size and annotation, hindering QAKG model generalization. To address this, we present QueryBridge, a dataset with over one million annotated questions paired with SPARQL queries. Each question is tagged with essential elements (e.g., entities, relationships) and annotated by query shape (e.g., chain, star) to support complex reasoning.
Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Benchmark
Construction safety research is a critical field in civil engineering, aiming to mitigate risks and prevent injuries through the analysis of site conditions and human factors. However, the limited volume and lack of diversity in existing construction safety datasets pose significant challenges to conducting in-depth analyses. To address this research gap, this paper introduces the Construction Safety Dataset (CSDataset), a well-organized comprehensive multi-level dataset that encompasses incidents, inspections, and violations recorded sourced from the Occupational Safety and Health Administration (OSHA). This dataset uniquely integrates structured attributes with unstructured narratives, facilitating a wide range of approaches driven by machine learning and large language models. We also conduct a preliminary approach benchmarking and various cross-level analyses using our dataset, offering insights to inform and enhance future efforts in construction safety. For example, we found that complaint-driven inspections were associated with a 17.3% reduction in the likelihood of subsequent incidents. Our dataset and code are released at https://github.com/zhenhuiou/Construction-Safety-Dataset-CSDataset.
PEQQS: a Dataset for Probing Extractive Quantity-focused Question Answering from Scientific Literature
Question Answering (QA) and Information Retrieval (IR) play a crucial role in information-seeking pipelines implemented in many emerging AI research assistant applications. Large Language Models (LLMs) have demonstrated exceptional effectiveness on QA tasks, with Retrieval Augmented Generation (RAG) techniques often boosting the results. However, in many of those emerging applications, the onus of conducting the actual literature search falls on the user, i.e. the user searches for the relevant literature and the LLM-based assistant extracts the solicited answers from each of the user-supplied documents. The interplay between the quality of the user-conducted search and the quality of the final results remains understudied. In this work, we focus on a specific version of such pipeline, where users aim to obtain a specific quantity as an extractive answer (e.g., a value of a particular measurable parameter). To this end, we provide a dataset of 1031 agricultural sciences abstracts annotated with correct extractive answers. Additionally, this dataset builds on our previous work, which focused on quantity-centric search from a corpus of over 3.3M documents, which means the dataset also consists of 1104 query-document relevance judgments for 39 queries. The availability of both document-level annotations and corpus-level relevance judgments means that our dataset allows for an end-to-end evaluation of an information-seeking pipeline consisting of both literature search and the QA module. We present how our dataset can be used both for the evaluation of extractive quantity-focused QA from science literature and for exploring the impact of search on the downstream results, specifically focusing on hallucinations resulting from processing non-relevant documents with LLMs.
A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text
Current methods for evaluating large language models (LLMs) typically focus on high-level tasks such as text generation, without targeting a particular AI application. This approach is not sufficient for evaluating LLMs for Responsible AI dimensions like fairness, since protected attributes that are highly relevant in one application may be less relevant in another. In this work, we construct a dataset that is driven by a real-world application (generate a plain-text product description, given a list of product features), parameterized by fairness attributes intersected with gendered adjectives and product categories, yielding a rich set of labeled prompts. We show how to use the data to identify quality, veracity, safety, and fairness gaps in LLMs, contributing a proposal for LLM evaluation paired with a concrete resource for the research community.
Real-E: A Foundation Benchmark for Advancing Robust and Generalizable Electricity Forecasting
Energy forecasting is vital for grid reliability and operational efficiency. Although recent advances in time series forecasting have led to progress, existing benchmarks remain limited in spatial and temporal scope and lack multi-energy features. This raises concerns about their reliability and applicability in real-world deployment. To address this, we present the Real-E dataset, covering over 74 power stations across 30+ European countries over a 10-year span with rich metadata. Using Real-E, we conduct an extensive data analysis and benchmark over 20 baselines across various model types. We introduce a new metric to quantify shifts in correlation structures and show that existing methods struggle on our dataset, which exhibits more complex and non-stationary correlation dynamics. Our findings highlight key limitations of current methods and offer a strong empirical basis for building more robust forecasting models.
SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset
Satire, irony, and sarcasm are techniques that are typically used humorously or critically, rather than deceptively; they can occasionally be mistaken for factual reporting, akin to fake news. These techniques can be applied at a more granular level, allowing satirical information to be incorporated into news articles. In this paper, we introduce the first sentence-level dataset for Romanian satire detection for news articles, called SeLeRoSa. The dataset comprises 13,873 manually annotated sentences spanning various domains, including social issues, IT, science, and movies. With the rise and recent progress of large language models (LLMs) in the natural language processing literature, LLMs have demonstrated enhanced capabilities to tackle various tasks in zero-shot settings. We evaluate multiple baseline models based on LLMs in both zero-shot and fine-tuning settings, as well as transformer-based models. Our findings reveal the current limitations of these models in the sentence-level satire detection task, paving the way for new research directions.
When Facts Expire: Benchmarking Temporal Validity in Knowledge Graphs
Knowledge Graphs (KGs) are essential in applications like semantic search, question answering, and decision support. They structure knowledge, validate facts, enable inference, and increasingly enhance Large Language Models (LLMs) by grounding outputs in structured, factual data. However, KGs have treated facts as static and timeless, ignoring the temporal nature of many truths. This leads to outdated or incorrect inferences. Temporal Knowledge Graphs (TKGs), like Wikidata and YAGO, address this by modeling the time-bound validity of facts. Multiple recent work has focused on predicting missing temporal facts, yet validating existing temporal information, to ensure the reliability and accuracy of TKGs, remains underexplored. In order to advance this area of research, we introduce the first benchmark designed to evaluate temporal fact validation methods. Derived from Wikidata, this benchmark supports systematic, quantitative, and qualitative comparisons, incorporating diverse assumptions about temporal data (e.g., timestamps, intervals) and KG structures (e.g., density, depth).
MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt
Errors in natural language generation, so-called hallucinations, remain a critical challenge, particularly in high-stakes domains such as healthcare or science communication. While several automatic metrics have been proposed to detect and quantify hallucinations, such as FactCC, QAGS, FEQA, and FactAcc, these metrics are often unavailable, difficult to reproduce, or incompatible with modern development workflows. We introduce MIRAGE, an open-source Python library designed to address these limitations. MIRAGE re-implements key hallucination evaluation metrics in a unified library built on the Hugging Face framework, offering modularity, reproducibility, and standardized inputs and outputs. By adhering to FAIR principles, MIRAGE promotes reproducibility, accelerates experimentation, and supports the development of future hallucination metrics. We validate MIRAGE by re-evaluating existing metrics on benchmark datasets, demonstrating comparable performance while significantly improving usability and transparency.
FinS-Pilot: A Benchmark for Online Financial RAG System
Large language models (LLMs) have demonstrated remarkable capabilities across various professional domains, with their performance typically evaluated through standardized benchmarks. In the financial field, the stringent demands for professional accuracy and real-time data processing often necessitate the use of retrieval-augmented generation (RAG) techniques. However, the development of financial RAG benchmarks has been constrained by data confidentiality issues and the lack of dynamic data integration. To address this issue, we introduce FinS-Pilot, a novel benchmark for evaluating RAG systems in online financial applications. Constructed from real-world financial assistant interactions, our benchmark incorporates both real-time API data and text data, organized through an intent classification framework covering critical financial domains. The benchmark enables comprehensive evaluation of financial assistants' capabilities in handling both static knowledge and time-sensitive market information.Through systematic experiments with multiple Chinese leading LLMs, we demonstrate FinS-Pilot's effectiveness in identifying models suitable for financial applications while addressing the current gap in specialized evaluation tools for the financial domain. Our work contributes both a practical evaluation framework and a curated dataset to advance research in financial NLP systems. The code and dataset are accessible on GitHub.
HUSK: A Hierarchically Structured Urban Knowledge Graph Dataset for Multi-Level Spatial Tasks
Urban spatial tasks span multiple levels, ranging from area-level analysis, crime prediction, and taxi demand forecasting to POI-level tasks such as new store recommendation. Urban knowledge graphs (UrbanKGs) can enhance these tasks by integrating structured urban knowledge. However, existing studies face two main issues: most research uses task-specific UrbanKGs for corresponding single-level predictions, and public UrbanKGs contain only coarse-grained administrative areas, lacking the rich semantic and spatial relationships required for multi-level tasks. We propose a Hierarchically Structured UrbanKG Dataset (HUSK) with an intermediate functional zone layer that bridges and enriches the understanding across multiple levels, and evaluate it on three area-level and three POI-level tasks, showing accuracy improvements over single-view baselines.
TalkDep: Clinically Grounded LLM Personas for Conversation-Centric Depression Screening
The increasing demand for mental health services has outpaced the availability of real training data to develop clinical professionals, leading to limited support for the diagnosis of depression. This shortage has motivated the development of simulated or virtual patients to assist in training and evaluation, but existing approaches often fail to generate clinically valid, natural, and diverse symptom presentations. In this work, we embrace the recent advanced language models as the backbone and propose a novel clinician-in-the-loop patient simulation pipeline, TalkDep, with access to diversified patient profiles to develop simulated patients. By conditioning the model on psychiatric diagnostic criteria, symptom severity scales, and contextual factors, our goal is to create authentic patient responses that can better support diagnostic model training and evaluation. We verify the reliability of these simulated patients with thorough assessments conducted by clinical professionals. The availability of validated simulated patients offers a scalable and adaptable resource for improving the robustness and generalisability of automatic depression diagnosis systems.
StoryWriter: A Multi-Agent Framework for Long Story Generation
Long story generation remains a challenge for existing large language models (LLMs), primarily due to two main factors: (1) discourse coherence, which requires plot consistency, logical coherence, and completeness in the long-form generation, and (2) narrative complexity, which requires an interwoven and engaging narrative. In this paper, we present StoryWriter, a modular and open-source multi-agent framework for controllable and scalable long story generation. We conduct both human and automated evaluation, and StoryWriter significantly outperforms existing story generation baselines in both story quality and length. Furthermore, we use StoryWriter to generate a dataset, which contains about 6,000 high-quality long stories, with an average length of 8,000 words. We train the model Llama3.1-8B and GLM4-9B using supervised fine-tuning on LongStory and develop StoryWriterLLAMA and StoryWriterGLM, which demonstrates advanced performance in long story generation. All code, models, and data are made publicly available to encourage further development.
Maneno Yetu: Dynamic Corpus Construction and Pretraining for Swahili NLP
Swahili occupies a central place in African linguistic landscapes, yet it is significantly under-resourced in NLP, reflecting a mismatch between speaker population and data availability. We introduce Maneno Yetu, a dynamic and extensible corpus designed to address this gap. It is continuously updated with diverse sources such as news articles, blogs, literature, and educational content. This structure enables robust pretraining and fine-tuning for Swahili NLP. The evolving nature of the corpus allows for longitudinal linguistic analysis, providing a unique opportunity to track language change over time. It also serves as a foundation for creating niche, task-specific datasets in low-resource settings. Building on Maneno Yetu, we present the Swahili Language Foundational Model (SLFM), a transformer-based model trained to support core NLP tasks including tokenization, part-of-speech tagging, machine translation, and abusive language detection. Both the corpus and model are released publicly to support reproducible research and foster community-driven development in African language technologies.
UXSim: Towards a Hybrid User Search Simulation
Simulating nuanced user experiences within complex interactive search systems poses distinct challenge for traditional methodologies, which often rely on static user proxies or, more recently, on standalone large language model (LLM) agents that may lack deep, verifiable grounding. The true dynamism and personalization inherent in human-computer interaction demand a more integrated approach. This work introduces UXSim. https://searchsim.org/uxsim, a novel framework that integrates both approaches. It leverages grounded data from traditional simulators to inform and constrain the reasoning of an adaptive LLM agent. This synthesis enables more accurate and dynamic simulations of user behavior while also providing a pathway for the explainable validation of the underlying cognitive processes.
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
Despite the rapid advancement of large language models, they remain highly susceptible to generating hallucinations, which significantly hinders their widespread application. Hallucination research requires dynamic and fine-grained evaluation. However, most existing hallucination benchmarks (especially in Chinese language) rely on human annotations, making automatical and cost-effective hallucination evaluation challenging. To address this, we introduce HaluAgent, an agentic framework that automatically constructs fine-grained question-answering (QA) dataset based on some knowledge documents. Our experiments demonstrate that the manually designed rules and prompt optimization can improve the quality of generated data. Using HaluAgent, we construct C-FAITH, a Chinese QA hallucination benchmark created from 1,399 knowledge documents obtained from web scraping, totaling 60,702 entries. We comprehensively evaluate 16 mainstream LLMs with our proposed C-FAITH, providing detailed experimental results and analysis.
PyG-SSL: A Graph Self-Supervised Learning Toolkit
Graph Self-Supervised Learning (SSL) has emerged as a pivotal area of research in recent years. By engaging in pretext tasks to learn the intricate topological structures and properties of graphs using unlabeled data, these graph SSL models achieve enhanced performance, improved generalization, and heightened robustness. Despite the remarkable achievements of these graph SSL methods, their current implementation poses significant challenges for beginners and practitioners due to the complex nature of graph structures, inconsistent evaluation metrics, and concerns regarding reproducibility hinder further progress in this field. Recognizing the growing interest within the research community, there is an urgent need for a comprehensive, beginner-friendly, and accessible toolkit consisting of the most representative graph SSL algorithms. To address these challenges, we present a Graph SSL toolkit named PyG-SSL, which is built upon PyTorch and is compatible with various deep learning and scientific computing backends. Within the toolkit, we offer a unified framework encompassing dataset loading, hyper-parameter configuration, model training, and comprehensive performance evaluation for diverse downstream tasks. Moreover, we provide beginner-friendly tutorials and the best hyper-parameters of each graph SSL algorithm on different graph datasets, facilitating the reproduction of results. The GitHub repository of the library is https://github.com/iDEA-iSAIL-Lab-UIUC/pyg-ssl.
TSD-CT: A Benchmark Dataset for Truthfulness Stance Detection
We present TSD-CT (Truthfulness Stance Detection-Claim and Tweet), a benchmark dataset designed to advance research in truthfulness stance detection. While prior stance detection datasets focus primarily on political figures, topics, or events, TSD-CT targets truthfulness stance of social media posts toward factual claims. Truthfulness stance reflects whether a post endorses a claim as true, rejects it as false, or expresses no clear position. This focus is particularly valuable for tracking public reactions to misinformation and for enabling applications that analyze belief dynamics in online discourse. TSD-CT comprises 5,331 claim-tweet pairs, each annotated into one of five classes: positive, negative, neutral/no stance, topically different, or problematic. To ensure annotation quality, we introduce a strategy that uses gold-standard labels to compute error scores, evaluate annotator performance, and filter out low-quality contributions. The resulting dataset achieves strong inter-annotator agreement. An error analysis further highlights frequent sources of confusion, particularly between neutral/no stance and other classes. The dataset, along with the annotation interface and codebase, is publicly released to facilitate further research.
SESSION: Demo Papers
RerankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback
Evaluating the quality of retrieval-augmented generation (RAG) and document reranking systems remains challenging due to the lack of scalable, user-centric, and multi-perspective evaluation tools. We introduce RankArena, a unified platform for comparing and analysing the performance of retrieval pipelines, rerankers, and RAG systems using structured human and LLM-based feedback as well as for collecting such feedback. RankArena supports multiple evaluation modes: direct reranking visualisation, blind pairwise comparisons with human or LLM voting, supervised manual document annotation, and end-to-end RAG answer quality assessment. It captures fine-grained relevance feedback through both pairwise preferences and full-list annotations, along with auxiliary metadata such as movement metrics, annotation time, and quality ratings. The platform also integrates LLM-as-a-judge evaluation, enabling comparison between model-generated rankings and human ground truth annotations. All interactions are stored as structured evaluation datasets that can be used to train rerankers, reward models, judgment agents, or retrieval strategy selectors. Our platform is publicly available at https://rankarena.ngrok.io/, and the Demo video is provided. https://youtu.be/jIYAP4PaSSI.
White Rabbit: Demonstrating Online KG Pathfinding Using Embeddings
The paper introduces White Rabbit, a novel method for discovering high-quality, meaningful paths between entities in online Knowledge Graphs (KGs). Traditional exploration methods, such as SPARQL endpoints, struggle due to the large size and complexity of KGs. The proposed approach addresses this by introducing the problem of context-aware path finding, ensuring that retrieved paths are coherent and involve highly relevant entities. White Rabbit uses embeddings to score entity neighbors, a queue-based prioritization mechanism, and an iterative refinement process to improve efficiency and relevance. The system is demonstrated live, allowing participants to test it and compare against baseline methods (structural approaches, pretrained embeddings, and large language models). Results show that White Rabbit enhances both the efficiency of exploration and the quality of discovered paths.
SQuAI: Scientific Question-Answering with Multi-Agent Retrieval-Augmented Generation
We present SQuAI (https://squai.scads.ai/), a scalable and trustworthy multi-agent retrieval-augmented generation (RAG) framework for scientific question answering (QA) with large language models (LLMs). SQuAI addresses key limitations of existing RAG systems in the scholarly domain, where complex, open-domain questions demand accurate answers, explicit claims with citations, and retrieval across millions of scientific documents. Built on over 2.3 million full-text papers from arXiv.org, SQuAI employs four collaborative agents to decompose complex questions into sub-questions, retrieve targeted evidence via hybrid sparse-dense retrieval, and adaptively filter documents to improve contextual relevance. To ensure faithfulness and traceability, SQuAI integrates in-line citations for each generated claim and provides supporting sentences from the source documents. Our system improves faithfulness, answer relevance, and contextual relevance by up to +0.088 (12%) over a strong RAG baseline. We further release a benchmark of 1,000 scientific question-answer-evidence triplets to support reproducibility. With transparent reasoning, verifiable citations, and domain-wide scalability, SQuAI demonstrates how multi-agent RAG enables more trustworthy scientific QA with LLMs.
KnowFE : A Hybrid AI System for Explainable Feature Engineering using Knowledge-Guided Reinforcement Learning
Feature engineering is a critical yet often manual step in building effective machine learning models. While automated machine learning (AutoML) has streamlined many aspects of model development, the generation of high-quality, interpretable features remains a key bottleneck, requiring case-by-case domain knowledge and significant effort. This challenge highlights the importance of automated feature engineering (AutoFE) as a critical component within AutoML pipeline. To address this, we recently proposed SMART, a novel AutoFE approach that combines knowledge graph reasoning and deep reinforcement learning to guide the generation of interpretable features. In this demonstration, we introduce KnowFE, a web-based AutoFE platform powered by SMART. KnowFE enables users to generate high-quality, human-understandable features without writing any code, striking a balance between explainability and predictive performance. With a user-friendly interface, it empowers data practitioners to efficiently enhance machine learning workflows across diverse domains. A video demonstration is available at https://www.KnowFE.com .
SDD: Shape-aware Data-driven Attention Mechanism for Time Series Analysis
Multivariate time series (mts ) analysis have extensive applications in various areas such as human activity recognition, healthcare, and economics, among others. Recently, Transformer approaches have been specifically designed for MTS and have consistently reported superior performance. In this paper, we demonstrate a software system for a recent efficient shape-aware Transformer (SDD ), where time-series subsequences (a.k.a shapes) are made available to users for investigation. First, a time-series Transformer, called SVP-T, takes shapes, together with their variable position information (VP information) as input to the training of a Transformer model. These shapes are computed from different variables and time intervals, enabling the Transformer model to learn dependencies simultaneously across both time and variables. Second, a data-driven kernel-based attention mechanism, called DARKER, reduces the time complexity of training Transformer models from O(N2) to O(N), where N is the number of inputs. As a result, the training process by using DARKER offers about 3x-4x speedup over vanilla Transformers'. In this demo, we present the first system (SDD ) that integrates SVP-T and DARKER. In particular, SDD visualizes the SVP-T's attention matrix and allows users to explore key shapes that have high attention weights. Furthermore, users can use SDD to decide the shape input to train a new model, to further balance between efficiency and accuracy.
The ReQAP System for Question Answering over Personal Information
Personal information is abundant on users' devices, from structured data in calendar, shopping records or fitness tools, to unstructured contents in mail and social media posts. This works presents the ReQAP system that supports users with answers for complex questions that involve filters, joins and aggregation over heterogeneous sources. The unique trait of ReQAP is that it recursively decomposes questions and incrementally builds an operator tree for execution. Both the question interpretation and the individual operators make smart use of light-weight language models, with judicious fine-tuning. The demo showcases the rich functionality for advanced user questions, and also offers detailed tracking of how the answers are computed by the operators in the execution tree. Being able to trace answers back to the underlying sources is vital for human comprehensibility and user trust in the system.
Explain and Monitor Deep Learning Models for Computer Vision using Obz AI
Deep learning has transformed computer vision (CV), achieving outstanding performance in classification, segmentation, and related tasks. Such AI-based CV systems are becoming prevalent, with applications spanning from medical imaging to surveillance. State of the art models such as convolutional neural networks (CNNs) and vision transformers (ViTs) are often regarded as ''black boxes,'' offering limited transparency into their decision-making processes. Despite a recent advancement in explainable AI (XAI), explainability remains underutilized in practical CV deployments. A primary obstacle is the absence of integrated software solutions that connect XAI techniques with robust knowledge management and monitoring frameworks. To close this gap, we have developed Obz AI, a comprehensive software ecosystem designed to facilitate state-of-the-art explainability and observability for vision AI systems. Obz AI provides a seamless integration pipeline, from a Python client library to a full-stack analytics dashboard. With Obz AI, a machine learning engineer can easily incorporate advanced XAI methodologies, extract and analyze features for outlier detection, and continuously monitor AI models in real time. By making the decision-making mechanisms of deep models interpretable, Obz AI promotes observability and responsible deployment of computer vision systems.
ClimBurst: A Dynamic Visualization Tool to Display Climatological Anomalies over Time and Space
Detecting abnormal climate events across temporal and spatial scales is crucial to the understanding of local and regional climate trends. This demonstration introduces ClimBurst, a dynamic tool to detect climate bursts, which are unusually high or low values of one or more climate variables over some time interval. ClimBurst detects bursts without prior assumptions about their temporal duration. The demonstration will allow users to interact directly with our system to see both a summary showing the presence/absence of bursts over a user-specified year and spatial range. The demonstration will also allow users to perform time-travel queries to see how bursts propagate over space and time.
CALLM: A Framework for Systematic Contrastive Analysis of Large Language Models
This study addresses the challenges of analyzing discrepancies between different large language models (LLMs). To facilitate the automatic exploration of these differences, we propose a novel system called CALLM(Contrastive Analyzer of LLMs) that systematically compares the outputs of two LLM versions based on user-defined queries. The system first generates a hierarchical topic structure rooted in a user-specified query, allowing for an organized comparison of topical categories. Subsequently, it evaluates the text generated by both LLMs to identify differences in knowledge and information presentation. This fully automated approach not only streamlines the identification of differences in knowledge stored by LLMs, model-specific characteristics and performance variations but can also enhance our understanding of architectural and training differences between compared LLMs. Our work contributes to the development of more transparent machine learning models and is meant to foster research in model evaluation and comparative analysis.
HealthGenie: A Knowledge-Driven LLM Framework for Tailored Dietary Guidance
Seeking dietary guidance often requires navigating complex nutritional knowledge while considering individual health needs. To address this, we present HealthGenie, an interactive platform that leverages the interpretability of knowledge graphs (KGs) and the conversational power of large language models (LLMs) to deliver tailored dietary recommendations alongside integrated nutritional visualizations for fast, intuitive insights. Upon receiving a user query, HealthGenie performs intent refinement and maps user's needs to a curated nutritional knowledge graph. The system then retrieves and visualizes relevant subgraphs, while offering detailed, explainable recommendations. Users can interactively adjust preferences to further tailor results. A within-subject study and quantitative analysis show that HealthGenie reduces cognitive load and interaction effort while supporting personalized, health-aware decision-making.
STORM: Spatio-Temporal Similar Trajectory Retrieval on Non-Uniform Maritime Data
Similar trajectory retrieval is crucial for maritime trajectory data analysis. However, due to issues such as errors in maritime positioning devices and the accuracy limitations of satellite positioning systems at sea, maritime trajectory data often exhibit characteristics of non-uniform sampling. Existing algorithms struggle to effectively model the irregularity of non-uniformly sampled maritime trajectories, leading to reduced performance in similar trajectory retrieval. In this demonstration, we present STORM, a system designed to effectively retrieve the top-k similar trajectories, which supports both user-specified and automated query settings. STORM utilizes a learnable Fourier-based encoding method to efficiently extract spatiotemporal features from non-uniform trajectories, significantly enhancing the model's performance in similar trajectory retrieval. Our demonstration shows that, compared to state-of-the-art (SOTA) methods, STORM achieves a 41.9% improvement in performance for similar trajectory retrieval on non-uniform maritime data. Our demonstration video is available at https://github.com/itszzzyyy/STORM.
Quantum Deepflow: A Quantum-Integrated Forecasting Platform for Strategic Decisions in Raw Material Procurement
We present Quantum Deepflow, a forecasting and decision support platform that integrates classical and quantum sequence modeling to address volatility and data irregularity in raw material procurement. The system combines an LSTM autoencoder with a Quantum Long Short-Term Memory (QLSTM) model, which enables robust and accurate forecasts from noisy time-series inputs. Users can interact with the platform through a visual interface that links forecast outputs to strategic key performance indicators such as purchase timing, cost estimates, and inventory risk. In a real-world deployment at a Korean steel manufacturer, the system achieved a 32.5% reduction in overstocking and saved $1.8 million in inventory costs. This work demonstrates a practical approach to exposing quantum-enhanced forecasting capabilities through an automated, cloud-based interface that bridges the gap between emerging quantum technology and enterprise-scale decision-making.
How to Make Museums More Interactive? Case Study of Artistic Chatbot
Conversational agents powered by Large Language Models (LLMs) are increasingly utilized in educational settings, in particular in individual closed digital environments, yet their potential adoption in the physical learning environments like cultural heritage sites, museums, and art galleries remains relatively unexplored. In this study, we present Artistic Chatbot, a voice-to-voice RAG-powered chat system to support informal learning and enhance visitor engagement during a live art exhibition celebrating the 15th anniversary of the Faculty of Media Art at the Warsaw Academy of Fine Arts, Poland. The question answering (QA) chatbot responded to free-form spoken questions in Polish using the context retrieved from a curated, domain-specific knowledge base consisting of 226 documents provided by the organizers, including faculty information, art magazines, books, and journals. We describe the key aspects of the system architecture and user interaction design, as well as discuss the practical challenges associated with deploying chatbots at public cultural sites. Our findings, based on interaction analysis, demonstrate that chatbots such as Artistic Chatbot effectively maintain responses grounded in exhibition content (60% of responses directly relevant), even when faced with unpredictable queries outside the target domain, showing their potential for increasing interactivity in public cultural sites. During the demo presentation, the audience will be invited to query our Artistic Chatbot, which adopts the persona of an artificial art curator, a role that involves responding to questions while simultaneously assessing their relevance to the exhibition. The link for the demo video is available here https://github.com/cinekucia/artistic-chatbot-cikm2025.
EdgeSLU: 1.58-bit Voice Control Framework
Natural-language voice interfaces promise ubiquitous smart environment control, yet cloud dependence incurs latency, connectivity, and privacy costs that are intolerable in safety-critical or bandwidth-limited settings. We introduce EdgeSLU, an entirely on-device speech pipeline that marries an extremely low 1.58-bit mixed-precision quantization scheme with an auto-tuned SIMD centric kernel, allowing deployment on commodity edge hardware with only tens of MB of memory. On a Raspberry Pi 5 (Arm Cortex A76), the Speech-To-Text (STT) engine achieves 6.37% WER in 35 MB RAM and 2.1s latency, while the quantized Natural Language Understanding (NLU) model delivers 93.33% intent accuracy in 0.48s and 23 MB RAM, yielding end-to-end interaction time of 2.6s. A lightweight paraphrase-based data generator bootstraps rich training sets from few examples, eliminating prohibitive annotation overhead. Demonstrated through offline control of Philips Hue lamps, EdgeSLU shows that aggressive mixed-precision quantization plus hardware-aware inference enables practical, privacy preserving voice control on off-the-shelf edge devices.
PlaceSim: An LLM-based Interactive Platform for Human Behavior Simulation in Physical Facilities
Physical facility design faces a fundamental cold-start problem: predicting human behavior in non-existent spaces. Traditional surveys and observational studies create gaps between stated preferences and actual usage, while existing simulation tools require significant technical expertise, limiting accessibility. We introduce PlaceSim, a web-based platform leveraging Large Language Models (LLMs) to simulate realistic human behavior in facilities through a zero-code interface. PlaceSim employs a Persona-Environment-Scenario (P.E.S.) framework that structures LLM reasoning through context-aware AI personas with transparent decision-making processes. The platform provides interactive facility design, AI-driven persona generation, live simulation with reasoning visualization, and what-if analysis for scenario comparison. Evaluated on 18 months of real-world apartment facility data (789,238 usage records from 8,435 residents), our zero-shot approach achieves Jensen-Shannon Divergence scores as low as 0.006, outperforming both supervised learning methods and existing LLM-based tools like SocioVerse without requiring training data. PlaceSim establishes new benchmarks for spatial behavior prediction while providing immediate, actionable insights for architects, urban planners, and facility managers through systematic simulation. The platform is available at https://simulation-viewer.vercel.app/
Achoio: A Skill-Aware Evaluation Management System for Text-To-Speech Research
Human subjective evaluation plays a crucial role in evaluating speech-related generative tasks such as text-to-speech (TTS) generation. However, current practices are often constrained by limited scalability, fragmented workflows, and inconsistent rating reliability. Researchers frequently rely on manual methods or general-purpose crowdsourcing systems, where recruiting appropriately skilled listeners is challenging, and result analysis is labor-intensive. In this work, we introduce Achoio, a dedicated end-to-end online system designed to streamline and scale human evaluation for the TTS research community. Achoio allows researchers to create and manage evaluation projects, upload synthesized speech samples, and automatically match them with qualified listeners based on linguistic proficiency and domain knowledge. The system provides built-in tools for project status tracking, result aggregation and visualization. In this demonstration, we will walk through the core features of Achoio, including intuitive project setup, skill-based listener matching algorithm, and automated analytics. By addressing the limitations of existing workflows, Achoio offers a scalable, domain-aware, and analysis-ready solution for conducting high-quality subjective TTS evaluations. Our system is live and can be found at https://www.achoio.com. Demo is available on YouTube at https://youtu.be/Ugjj3_YooSM.
LLM4IA: Index Advising Via Large Language Models
Recently, large-language models (LLMs) have demonstrated strong potential to solve database problems. However, LLMs still face two challenges in solving the index selection problem: (1) representing the workload in an LLM-friendly form and (2) finding the optimal index set. To solve these challenges, we propose LLM4IA, an LLM-based index selection method that can recommend indexes for any analytical workload directly on any database instance. LLM4IA proposes a concise description of natural language by extracting and sorting predicates and completely avoiding numerical input. LLM4IA adopts an iterative index selection process by repeatedly improving previous index candidates and summarizing effective candidates. Experiments on TPC-H and TPC-DS show that LLM4IA surpasses the near-optimal index advisor Extend by 5%-10%. Our demonstration highlights how LLM4IA recommends high-quality indexes for a new database instance without expensive retraining or fine-tuning.
ESPRESSO: Privacy-Preserving Keyword Search on Decentralized Data with Differential Visibility Constraints
We present ESPRESSO, a system designed for scalable and privacy-preserving keyword search in decentralized data cooperatives. It addresses the challenges that differential access control (allowing different data access rights for different search parties) poses to de- centralized search by leveraging decentralized indexing, metadata-driven source selection, and decentralized ranking techniques. The system ensures that search parties can only access data within their data visibility scope, while maintaining high retrieval efficiency and results quality. This demo will showcase its functionality through interactive scenarios, including live querying, dynamic source selection, and real-time visualization of search results. The audience will have hands-on interaction with the system, exploring its application in real-world scenarios, such as in the healthcare domain.
terazi: AI Fairness Tool for Doubly Imbalanced Data
The field of Artificial Intelligence (AI) fairness focuses on developing unbiased approaches for machine learning problems, with many contributions and ready-to-use tools. However, existing solutions fall short when both sensitive attributes and target labels have imbalanced representations in a given dataset. Our proposed algorithm and its tool, terazi, aim to propose a fair AI solution for this doubly imbalanced case. The proposed solution is based on finding the optimal distribution within the imbalanced data to balance fairness and classification performance, and the tool facilitates using this solution. In this demonstration, we showcase the capabilities of our algorithm, and the easy-to-use GUI of our web application for data scientists, researchers, and AI practitioners.
AnDri: A System for Anomaly and Drift co-Detection
The presence of concept drift poses challenges for anomaly detection in time series. While anomalies are caused by undesirable changes in the data, differentiating abnormal changes from varying normal behaviours is difficult due to differing frequencies of occurrence, varying time intervals when normal patterns occur, and identifying similarity thresholds to separate the boundary between normal vs. abnormal sequences. Differentiating between concept drift and anomalies is critical for accurate analysis as studies have shown that the compounding effects of error propagation in downstream data analysis tasks lead to lower detection accuracy and increased overhead due to unnecessary model updates. Unfortunately, existing work has largely explored anomaly detection and concept drift detection in isolation. We develop AnDri, a system for Anomaly detection in the presence of Drift, and enables users to interactively co-explore the interaction of anomalies and drift. Our system demonstration provides two motivating scenarios that extend existing anomaly detection baselines with partial labels towards improved co-detection accuracy, and highlights the superiority of AnDri over these baselines.
L3X: Long Object List Extraction from Long Documents
Information extraction with LLMs is typically geared toward extracting individual subject-predicate-object (SPO) triples from short factual texts such as Wikipedia or news articles. In contrast, the L3X methodology tackles the task of extracting long lists from long texts: given a target subject S and predicate P, the goal is to extract the complete list of all objects O for which SPO holds. This is especially challenging over long texts, like entire books or large web crawls, where many objects are long-tail entities. We demonstrate L3X, a web-based system designed for this previously unexplored task. L3X comprises of recall-oriented candidate generation using LLMs in RAG mode, with novel methods for ranking and batching passages, followed by precision-oriented scrutinization. Our demo supports exploring multiple configurations, including LLM-only and RAG baselines, showcasing use cases like fiction-character relations from book series (e.g., 50+ friends of Harry Potter) and business relations from web pages (e.g., CEOs of Toyota).
The Temporal Game: A New Perspective on Temporal Relation Extraction
In this paper we demo the Temporal Game, a novel approach to temporal relation extraction that casts the task as an interactive game. Instead of directly annotating interval-level relations, our approach decomposes them into point-wise comparisons between the start and end points of temporal entities. At each step, players classify a single point relation, and the system applies temporal closure to infer additional relations and enforce consistency. This point-based strategy naturally supports both interval and instant entities, enabling more fine-grained and flexible annotation than any previous approach. The Temporal Game also lays the groundwork for training reinforcement learning agents, by treating temporal annotation as a sequential decision-making task. To showcase this potential, the demo presented in this paper includes a Game mode, in which users annotate texts from the TempEval-3 dataset and receive feedback based on a scoring system, and an Annotation mode, that allows custom documents to be annotated and resulting timeline to be exported. Therefore, this demo serves both as a research tool and an annotation interface. The demo is publicly available at https://temporal-game.inesctec.pt, and the source code is open-sourced to foster further research and community-driven development in temporal reasoning and annotation.
Compare: A Framework for Scientific Comparisons
Navigating the vast and rapidly increasing sea of academic publications to identify institutional synergies, benchmark research contributions and pinpoint key research contributions has become an increasingly daunting task, especially with the current exponential increase in new publications. Existing tools provide useful overviews or single-document insights, but none supports structured, qualitative comparisons across institutions or publications. To address this, we demonstrate Compare, a novel framework that tackles this challenge by enabling sophisticated long-context comparisons of scientific contributions. Compare empowers users to explore and analyze research overlaps and differences at both the institutional and publication granularity, all driven by user-defined questions and automatic retrieval over online resources. For this we leverage on Retrieval-Augmented Generation over evolving data sources to foster long context knowledge synthesis. Unlike traditional scientometric tools, Compare goes beyond quantitative indicators by providing qualitative, citation-supported comparisons.
A Demonstration of PKGem: Secure Enrichment of Personal Knowledge Graphs
We present PKGem, a system that provides an end-to-end secure solution to enrich personal knowledge graphs in mobile environments. This task faces two core challenges: First, the proprietary, user-centric, and locally stored nature of personal knowledge graphs makes collaborative enrichment with socially connected peers a privacy concern. Moreover, the mobile environment has strict constraints on resource and computation cost, requiring lightweight and efficient design. PKGem addresses both challenges by leveraging cryptographic techniques to enable secure data enrichment across personal knowledge graphs, while remaining practical under mobile constraints. The system is implemented as an Android application and supports a variety of real-world usage scenarios. The code of PKGem is available at https://github.com/golden-eggs-lab/pkgem, with a demonstration video link included in the repository.
MMM-fair: An Interactive Toolkit for Exploring and Operationalizing Multi-Fairness Trade-offs
Fairness-aware classification requires balancing performance and fairness, often intensified by intersectional biases. Conflicting fairness definitions further complicate the task, making it difficult to identify universally fair solutions. Despite growing regulatory and societal demands for equitable AI, popular toolkits offer limited support for exploring multi-dimensional fairness and related trade-offs. To address this, we present mmm-fair, an open-source toolkit leveraging boosting-based ensemble approaches that dynamically optimizes model weights to jointly minimize classification errors and diverse fairness violations, enabling flexible multi-objective optimization. The system empowers users to deploy models that align with their context-specific needs while reliably uncovering intersectional biases often missed by state-of-the-art methods. In a nutshell, mmm-fair uniquely combines in-depth multi-attribute fairness, multi-objective optimization, a no-code, chat-based interface, LLM-powered explanations, interactive Pareto exploration for model selection, custom fairness constraint definition, and deployment-ready models in a single open-source toolkit, a combination rarely found in existing fairness tools. Demo walkthrough available at: https://youtu.be/_rcpjlXFqkw.
iMask: Towards a Smart Mask Network Prototype for Monitoring Respiratory Viruses
Although the impact of COVID-19 pandemic has largely withered away over the years, the enduring presence of face masks continues to linger within our society. Still today, they're a familiarity, a crutch to fall back upon when sickness makes its rounds. However, it is often the case that we wear masks when they are not necessary, and, more concerningly, fail to wear them when they are truly necessary. Inefficient viral tracking methods further exacerbate the issue as they often do not alert the public and healthcare officials to oncoming or currently happening outbreaks fast enough. To combat these issues, new smart masks have been developed recently, equipped with biosensors to detect respiratory viruses in the air. However, these masks have their own drawbacks, including limited detection accuracy, detection scope and beneficial population. In response, this work presents the first of its kind prototype named iMask to augment current smart masks. By connecting the smartphone of the mask wearer to the Internet, iMask improves the detection accuracy. At its core, iMask employs multi-variate time series (1) imputation algorithms to alleviate the data scarcity issue and (2) forecasting algorithms to predict future viral levels. Based on shared, imputed and forecast viral data, iMask further leverages map services to create a viral concentration map that significantly expands the beneficiaries.
ReportGRI: Automating GRI Alignment and Report Assessment
Organisations disclose their sustainability performance in corporate sustainability reports (CSRs). CSRs vary widely in structure and depth depending on the reporting framework. Such disparity, together with report complexity and volume, poses significant challenges to transparency, comparability and standardisation. To address this problem, we introduce ReportGRI, an automated system for Global Reporting Initiative (GRI) indexing and qualitative assessment of CSRs. The interactive framework leverages information retrieval techniques and zero-shot prompting to enable GRI disclosure-based report indexing and report coverage assessment by visualising well-covered topics and reporting gaps. The tool facilitates scalable and explainable benchmarking of Environmental, Social and Governance (ESG) reporting quality, enhancing report interpretation, transparency, and corporate accountability. The system is open-sourced on GitHub with an introduction video
MedSEBA: Synthesizing Evidence-Based Answers Grounded in Evolving Medical Literature
In the digital age, people often turn to the Internet in search of medical advice and recommendations. With the increasing volume of online content, it has become difficult to distinguish reliable sources from misleading information. Similarly, millions of medical studies are published every year, making it challenging for researchers to keep track of the latest scientific findings. These evolving studies can reach differing conclusions, which is not reflected in traditional search tools. To address these challenges, we introduce MedSEBA, an interactive AI-powered system for synthesizing evidence-based answers to medical questions. It utilizes the power of Large Language Models to generate coherent and expressive answers, but grounds them in trustworthy medical studies dynamically retrieved from the research database PubMed. The answers consist of key points and arguments, which can be traced back to respective studies. Notably, the platform also provides an overview of the extent to which the most relevant studies support or refute the given medical claim, and a visualization of how the research consensus evolved through time. Our user study revealed that medical experts and lay users find the system usable and helpful, and the provided answers trustworthy and informative. This makes the system well-suited for both everyday health questions and advanced research insights.
An Interventional Approach to Real-Time Disaster Assessment via Causal Attribution
Traditional disaster analysis and modelling tools for assessing the severity of a disaster are predictive in nature. Based on the past observational data, these tools prescribe how the current input state (e.g., environmental conditions, situation reports) results in a severity assessment. However, these systems are not meant to be interventional in the causal sense, where the user can modify the current input state to simulate counterfactual ''what-if'' scenarios. In this work, we provide an alternative interventional tool that complements traditional disaster modelling tools by leveraging real-time data sources like satellite imagery, news, and social media. Our tool also helps understand the causal attribution of different factors on the estimated severity, over any given region of interest. In addition, we provide actionable recourses that would enable easier mitigation planning. Our source code is publicly available.
JustEva: A Toolkit to Evaluate LLM Fairness in Legal Knowledge Inference
The integration of Large Language Models (LLMs) into legal practice raises pressing concerns about judicial fairness, particularly due to the nature of their ''black-box'' processes. This study introduces JustEva, a comprehensive, open-source evaluation toolkit designed to measure LLM fairness in legal tasks. JustEva features several advantages: (1) a structured label system covering 65 extra-legal factors; (2) three core fairness metrics -- inconsistency, bias, and imbalanced inaccuracy; (3) robust statistical inference methods; and (4) informative visualizations. The toolkit supports two types of experiments, enabling a complete evaluation workflow: (1) generating structured outputs from LLMs using a provided dataset, and (2) conducting statistical analysis and inference on LLMs' outputs through regression and other statistical methods. Empirical application of JustEva reveals significant fairness deficiencies in current LLMs, highlighting the lack of fair and trustworthy LLM legal tools. JustEva offers a convenient tool and methodological foundation for evaluating and improving algorithmic fairness in the legal domain.. The toolkit is available for deployment at https://github.com/KYSpring/ai_fairness_demo. A video demonstration of the toolkit is available at https://drive.google.com/file/d/1lB2U3q-kI5B5frv8iqVceVaA9Yks3kE6/view?usp=sharing.
Guess the Age of Photos: An Interactive Web Platform for Historical Image Age Estimation
This paper introduces Guess the Age of Photos, a web platform engaging users in estimating the years of historical photographs through two gamified modes: Guess the Year (predicting a single image's year) and Timeline Challenge (comparing two images to identify the older). Built with Python, Flask, Bootstrap, and PostgreSQL, it uses a 10,150-image subset of the Date Estimation in the Wild dataset (1930-1999). Features like dynamic scoring and leaderboards boost engagement. Evaluated with 113 users and 15,473 gameplays, the platform earned a 4.25/5 satisfaction rating. Users excelled in relative comparisons (65.9% accuracy) over absolute year guesses (25.6% accuracy), with older decades easier to identify. The platform serves as an educational tool, fostering historical awareness and analytical skills via interactive exploration of visual heritage. Furthermore, the platform provides a valuable resource for studying human perception of temporal cues in images and could be used to generate annotated data for training and evaluating computer vision models.
VoiceVisSystem: End-to-End Voice-driven Data Visualization Generation from Natural Language Questions
In today's digital era, data visualization (DV) technology has become indispensable for tasks involving data processing and graphical reasoning. In this demonstration, we introduce a novel automatic DV system named VoiceVisSystem. VoiceVisSystem is designed for transforming speech-form natural language questions (NLQs) into visual data representations, a task formally known as Speech-to-Vis. Unlike the existing cascaded method (e.g., Sevi), the core component of our system relies on an advanced end-to-end speech-to-vis model named SpeechVisNet, eliminating the need for text as an intermediate medium and directly facilitating the conversion from Speech-form to DV. Specifically, the speech encoder and the text encoder of the SpeechVisNet respectively take the user's NLQs and the corresponding database information as inputs and convert them into hidden representations. Then, a grammar-based decoder generates the corresponding DVs as the output. As a result, our system enjoys the benefits of avoiding error propagation, thereby enhancing accuracy. By offering a seamless solution for the speech-to-vis task, VoiceVisSystem presents a promising tool for practical applications in various domains. The demonstration video is available at https://1drv.ms/v/s!Ah2vhbolPBFMiSNPZLunJ6Qp6jqU?e=Shyq8R.
CyberBOT: Ontology-Grounded Retrieval Augmented Generation for Reliable Cybersecurity Education
Advancements in large language models (LLMs) have enabled the development of intelligent educational tools that support inquiry-based learning across technical domains. In cybersecurity education, where accuracy and safety are paramount, systems must go beyond surface-level relevance to provide information that is both trustworthy and domain-appropriate. To address this challenge, we introduce CyberBOT. Code: https://github.com/rccrdmr/CyberBOT, a question-answering chatbot that leverages a retrieval-augmented generation (RAG) pipeline to incorporate contextual information from course-specific materials and validate responses using a domain-specific cybersecurity ontology. The ontology serves as a structured reasoning layer that constrains and verifies LLM-generated answers, reducing the risk of misleading or unsafe guidance. CyberBOT has been deployed in a large graduate-level course at Arizona State University (ASU). Video: https://youtu.be/X2WorBxOQHo which illustrates a promising direction for developing reliable and curriculum-aligned AI applications in specialized educational contexts.
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots
AGENTiGraph is a user-friendly, agent-driven system that enables intuitive interaction and management of domain-specific data through the manipulation of knowledge graphs in natural language. It gives non-technical users a complete, visual solution to incrementally build and refine their knowledge bases, allowing multi-round dialogues and dynamic updates without specialized query languages. The flexible design of AGENTiGraph, including intent classification, task planning, and automatic knowledge integration, ensures seamless reasoning between diverse tasks. Evaluated on a 3,500-query benchmark within an educational scenario, the system outperforms strong zero-shot baselines (achieving 95.12% classification accuracy, 90.45% execution success), indicating potential scalability to compliance-critical or multi-step queries in legal and medical domains, e.g., incorporating new statutes or research on the fly. Our open-source demo offers a powerful new paradigm for multi-turn enterprise knowledge management that bridges LLMs and structured graphs.
OntoLDiff: A Highly Efficient System for Tracking Logical Difference in Large-Scale Ontologies
Modern ontologies undergo continuous evolution to accommodate new domain knowledge, correct modeling errors, and adapt to changing user requirements. Monitoring these changes is crucial for maintaining ontology quality and understanding the semantic impact of modifications on dependent systems and applications. This paper describes OntoLDiff, a highly efficient system for tracking the logical difference between two ontologies formulated in the description logic ELH. Intuitively, the logical difference between two versions of an ontology refers to the set of axioms entailed by one version but not the other, indicating the information gain or loss between them. Typically, such axioms, referred to as 'witnesses ', can be infinite, making logical difference computation infeasible. To address this challenge, OntoLDiff employs a Uniform Interpolation (UI) approach to compute a finite representation of these axioms. Instead of computing all the entailments of one ontology but not the other, which would be computationally infeasible, the UI-based approach focuses on identifying only the strongest entailments, from which all witnesses can in principle be computed from its deductive closure. Despite UI's computational complexity, OntoLDiff is currently the only tool that efficiently tracks logical differences in industrial-scale ontologies, enabling ontology curators to precisely identify meaningful changes during ontology evolution.
AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance
Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.
TrustMap: Mapping Truthfulness Stance of Social Media Posts on Factual Claims for Geographical Analysis
Factual claims and misinformation circulate widely on social media, shaping public opinion and decision-making. The concept of truthfulness stance refers to whether a text affirms a claim as true, rejects it as false, or takes no clear position. Capturing such stances is essential for understanding how the public engages with and propagates misinformation. We present TrustMap, an application that identifies and visualizes stances of tweets toward factual claims. Users may input factual claims or select claims from a curated set. For each claim, TrustMap retrieves relevant social media posts and applies a retrieval-augmented approach with fine-tuned language models to classify stance. Posts are classified as positive, negative, or neutral/no stance. These classifications are then aggregated by location to reveal regional variations in public opinion. To enhance interpretability, TrustMap uses large language models to generate stance explanations for individual posts and to produce regional stance summaries. By integrating retrieval-augmented truthfulness stance detection with geographical visualization, TrustMap provides the first tool of its kind for exploring how belief in factual claims varies across regions.
SESSION: PhD Symposium
Towards Rational Pesticide Design with Graph Machine Learning Models for Ecotoxicology
This research focuses on rational pesticide design, using graph machine learning to accelerate the development of safer, eco-friendly agrochemicals, inspired by in silico methods in drug discovery. With an emphasis on ecotoxicology, the initial contributions include the creation of ApisTox, the largest curated dataset on pesticide toxicity to honey bees. We conducted a broad evaluation of machine learning (ML) models for molecular graph classification, including molecular fingerprints, graph kernels, GNNs, and pretrained transformers. The results show that methods successful in medicinal chemistry often fail to generalize to agrochemicals, underscoring the need for domain-specific models and benchmarks. Future work will focus on developing a comprehensive benchmarking suite and designing ML models tailored to the unique challenges of pesticide discovery.
Towards Trustworthy AI: Enhancing Factuality, Bias, and Compliance in LLMs
Large Language Models (LLMs) have demonstrated impressive capabilities across natural language tasks, yet critical concerns persist regarding their factual reliability, societal bias, and alignment with regulatory norms. Central to addressing these challenges is the ability to systematically extract, normalize, and rank the claims made by LLMs-whether factual, normative, or policy-relevant. However, existing approaches often assume that claims are self-contained within individual sentences, overlooking the reality that many important claims emerge only through multi-sentence context. This leads to fragmented analysis and underestimates the complexity of model behavior. Furthermore, current methods are limited in scope, often relying on narrow domains and fixed knowledge sources, and struggle to identify or prioritize claims with potential for social harm or legal noncompliance. By advancing methods for context-aware claim extraction, standardization across sensitive attributes and regulatory categories, and risk-informed ranking, this research aims to provide a more comprehensive foundation for evaluating and auditing LLM outputs. Such a framework is essential for building systems that are not only factually grounded, but also fair, transparent, and compliant in high-stakes applications.
Reasoning over Incomplete Knowledge Graphs
Incomplete knowledge graphs present a fundamental challenge for reliable multi-hop knowledge graph question answering (KGQA), causing reasoning failures when key factual triples are missing. While large language models (LLMs) offer strong reasoning capabilities for KGQA, they are prone to hallucinations and often assume complete knowledge graphs (KGs). This research identifies key bottlenecks in current LLM-KGQA pipelines caused by KG incompleteness. We propose targeted remedies, centered on integrating link prediction tools, to enhance performance in sparse KGs. We explore two main directions: (1) improving the robustness of LLM-KGQA methods under KG sparsity, and (2) leveraging advanced link prediction techniques to recover missing graph connections. Preliminary experiments on benchmark datasets demonstrate significant improvements in both answer accuracy and link prediction performance under simulated KG sparsity. These results bridge the gap between LLM-based reasoning and incomplete KGs, laying the foundation for more faithful and interpretable KGQA systems.
DeepEyeNet: Generating Medical Report for Retinal Images
The increasing prevalence of retinal diseases poses a significant challenge to the healthcare system, as the demand for ophthalmologists surpasses the available workforce. This imbalance creates a bottleneck in diagnosis and treatment, potentially delaying critical care. Traditional methods of generating medical reports from retinal images rely on manual interpretation, which is time-consuming and prone to errors, further straining ophthalmologists' limited resources. This thesis investigates the potential of Artificial Intelligence (AI) to automate medical report generation for retinal images. AI can quickly analyze large volumes of image data, identifying subtle patterns essential for accurate diagnosis. By automating this process, AI systems can greatly enhance the efficiency of retinal disease diagnosis, reducing doctors' workloads and enabling them to focus on more complex cases. The proposed AI-based methods address key challenges in automated report generation: (1) A multi-modal deep learning approach captures interactions between textual keywords and retinal images, resulting in more comprehensive medical reports; (2) Improved methods for medical keyword representation enhance the system's ability to capture nuances in medical terminology; (3) Strategies to overcome RNN-based models' limitations, particularly in capturing long-range dependencies within medical descriptions; (4) Techniques to enhance the interpretability of the AI-based report generation system, fostering trust and acceptance in clinical practice. These methods are rigorously evaluated using various metrics and achieve state-of-the-art performance. This thesis demonstrates AI's potential to revolutionize retinal disease diagnosis by automating medical report generation, ultimately improving clinical efficiency, diagnostic accuracy, and patient care. DeepEyeNet Project Github: https://github.com/Jhhuangkay/DeepOpht-Medical-Report-Generation-for-Retinal-Images-via-Deep-Models-and-Visual-Explanation
Graph Neural Network Architecture Search via Hybrid Genetic Algorithm with Parallel Tempering
In recent years, there has been a surge of interest in harnessing graph neural networks (GNNs) for graph classification tasks. Despite the strong performance of manually designed GNN architectures, their development often relies on time-consuming, expert-driven trial and error, which may overlook promising design opportunities. To address this challenge, we propose a hybrid Genetic Algorithm and Parallel Tempering (GA+PT) framework for automated GNN architecture search. Our method systematically encodes a rich design space-including convolutional layer types (GCN, GraphSAGE, GAT, GIN, SGC), attention heads, hidden-layer widths, dropout rates, weight-initialization schemes, learning rates, and classifier structures-into evolutionary genotypes. A population-based GA explores this space via crossover and mutation, while an inner PT Markov Chain Monte Carlo procedure adaptively refines individual solutions under a temperature schedule to escape local optima. We validate our approach on benchmark graph datasets, demonstrating its ability to discover high-performing architectures that balance classification accuracy, macro-F1 score, and model complexity. The proposed GA+PT hybrid offers a robust, scalable, and resource-aware alternative to manual GNN design and purely gradient-based neural architecture search methods.
Real until proven fake - Source-Level Audio Deepfake Detection (with PIPNet)
The rapid development of synthetic speech technologies poses significant challenges to digital security and authenticity verification. This work investigates the use of prototype-based neural networks for detecting and classifying audio deepfakes by tracing them back to their generative source. Using time-frequency representations of speech (spectrograms, mel-spectrograms, MFCC), we evaluated model performance across multilingual and monolingual setups. Our results demonstrate that PIPNet reliably distinguishes between real and synthetic speech and effectively identifies the source TTS generator, making it a strong candidate for source-level attribution in audio deepfake detection.
Investigating the Usage and Evaluation of Quantum Computing Technologies for Information Access
Quantum Computing (QC) is an emerging research field that is attracting significant interest from the scientific community. In fact, it is believed that quantum computers can be employed to solve complex computational problems more efficiently than traditional computers, due to their inherent capabilities of exploring large search spaces very efficiently by leveraging the principles of quantum mechanics, such as superposition, entanglement, and tunnelling. However, quantum computers are still in their early stages of development and their applications are very limited, especially in the field of Information Access (IA). Nevertheless, IA systems often face complex optimization problems that might be solved more efficiently through the usage of quantum computers. This paper outlines the author's PhD objectives in designing new methodologies for the application and evaluation of QC technologies for IA problems. Furthermore, this work provides an overview of the achieved preliminary results and a discussion of possible future research directions.
The Landscape of Foundation Models for Molecular Chemistry
Pre-trained neural networks have recently emerged as powerful tools for molecular data mining, offering an alternative to classical approaches. However, these models are often evaluated on limited datasets with narrow baselines, leaving their benefits unclear. We present the first large-scale benchmark comparing pre-trained molecular embedding models across 20 public datasets spanning classification and regression tasks. Our evaluation covers text-based, graph-based, and multimodal architectures, all tested under a unified methodology. The results show that the classical fingerprint-based models remain highly competitive. Only a few models consistently exceeded the baseline. We also highlight key factors influencing model performance, offering practical guidance for model selection and future improvements in molecular embeddings.
Eliminating Bias from Presentation Attack Detection Algorithms for Face Recognition Systems
This paper presents an analysis of the fairness of presentation attack detection algorithms in face recognition systems. The study is the initial part of my Ph.D. research, with the aim of developing robust and unbiased PAD methodologies. One of the main aspects of this work involves the manual annotation of demographic groups within used biometric datasets, as tested automated tools do not offer sufficient accuracy for such labeling tasks. Another part of this work was the evaluation of two open-source PAD algorithms. These experiments reveal performance disparities across different demographic groups, highlighting the presence of algorithmic bias. Future stages of my research will explore the impact of sensor characteristics, specifically near-infrared (NIR) imaging, on demographic bias, with the ultimate goal of designing a bias-resilient PAD algorithm.
Explainable Numerical Claim Verification
The rapid proliferation of mis- and disinformation in the digital age highlights the urgent need for scalable, transparent, and trustworthy automated fact-checking systems. Large Language Models (LLMs) offer strong language understanding capabilities but suffer from opacity and brittleness, particularly in reasoning over numerical claims. This work explores how Explainable Artificial Intelligence (XAI)-through the lens of counterfactual explanations and adversarial training-can be used to systematically evaluate and improve the robustness of LLMs against perturbed numerical inputs. We propose a framework that employs counterfactual generation to both probe LLM reliability and generate user-appropriate explanations. Through empirical evaluations using a large-scale numerical fact-checking dataset (QuanTemp), we show that even state-of-the-art LLMs are susceptible to subtle numerical perturbations, impacting verdict accuracy. Our methodology contributes a dual-purpose diagnostic and training strategy that not only bolsters robustness but also enables both global and local interpretability-thereby improving explainability in automated fact-checking systems.
Toward Robust Machine Learning under Diverse Incomplete Data Mechanisms in Real-World Applications
Incomplete data is a pervasive challenge across a wide range of data types, including tabular, sensor, time-series, image, and textual data. Its presence stems from various real-world factors and gives rise to different missingness mechanisms. While much of the existing research focuses on the Missing Completely At Random (MCAR) assumption, the more complex and realistic mechanisms-Missing At Random (MAR) and Missing Not At Random (MNAR)-remain relatively underexplored despite their prevalence and impact. This PhD project aims to systematically investigate the challenges posed by diverse Incomplete data mechanisms and to develop robust machine learning methods that can perform reliably across MCAR, MAR, and MNAR scenarios. The research spans multiple data modalities and focuses on improving both the theoretical understanding and practical handling of incomplete data. By addressing mechanism-specific imputation challenges and proposing broadly applicable solutions, this work contributes to building more resilient and trustworthy data-driven systems in real-world settings.
SESSION: Tutorials
Towards Large Generative Recommendation: A Tokenization Perspective
The emergence of large generative models is transforming the landscape of recommender systems. One of the most fundamental components in building these models is action tokenization, the process of converting human-readable data (e.g., user-item interactions) into machine-readable formats (e.g., discrete token sequences). In this tutorial, we present a comprehensive overview of existing action tokenization techniques, converting actions to (1) item IDs, (2) textual descriptions, and (3) semantic IDs. We then make an in-depth discussion on the challenges and open questions of building large generative recommendation models from the perspective of action tokenization. Materials of this tutorial are available at: https://large-genrec.github.io/.
Socially Responsible and Trustworthy Generative Foundation Models: Principles, Challenges, and Practices
Generative foundation models (GenFMs), including large language and multimodal models, are transforming information retrieval and knowledge management. However, their rapid adoption raises urgent concerns about social responsibility, trustworthiness, and governance. This tutorial offers a comprehensive, hands-on overview of recent advances in responsible GenFMs, covering foundational concepts, multi-dimensional risk taxonomies (including safety, privacy, robustness, truthfulness, fairness, and machine ethics), state-of-the-art evaluation benchmarks, and effective mitigation strategies. We integrate real-world case studies and practical exercises using open-source tools, and present key perspectives from both policy and industry, including recent regulatory developments and enterprise practices. The session concludes with a discussion of open challenges, providing actionable guidance for the CIKM community.
A Tutorial on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide
Higher-order interactions (HOIs) are ubiquitous in real-world networks, such as group discussions on online Q&A platforms, co-purchases of items in e-commerce, and collaborations of researchers. Investigation of deep learning for networks of HOIs, expressed as hypergraphs, has become an important agenda for the data mining and machine learning communities. As a result, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given this emerging trend, we provide a timely tutorial dedicated to HNNs. We cover the following topics: (1) inputs, (2) message passing schemes, (3) training strategies, (4) applications (e.g., recommender systems and time series analysis), and (5) open problems of HNNs. This tutorial is intended for researchers and practitioners who are interested in hypergraph representation learning and its applications.
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
Generative models such as Large Language Models, Diffusion Models, and generative adversarial networks have recently revolutionized the creation of synthetic data, offering scalable solutions to data scarcity, privacy, and annotation challenges in data mining. This tutorial introduces the foundations and latest advances in synthetic data generation, covers key methodologies and practical frameworks, and discusses evaluation strategies and applications. Attendees will gain actionable insights into leveraging generative synthetic data to enhance data mining research and practice. More information can be found on our website: https://syndata4dm.github.io/.
Neural Differential Equations for Continuous-Time Analysis
Modeling complex, irregular time series is a critical challenge in knowledge discovery and data mining. This tutorial introduces Neural Differential Equations (NDEs)--a powerful paradigm for continuous-time deep learning that intrinsically handles the non-uniform sampling and missing values where traditional models falter. We provide a comprehensive review of the theory and practical application of the entire NDE family: Neural Ordinary (NODEs), Controlled (NCDEs), and Stochastic (NSDEs) Differential Equations. The tutorial emphasizes robustness and stability and culminates in a hands-on session where participants will use key open-source libraries to solve real-world tasks like interpolation and classification. Designed for AI researchers and practitioners, this tutorial equips attendees with essential tools for time series analysis.
Retrieval of Graph Structured Objects: Theory and Applications
Graph-structured data is ubiquitous across diverse domains like social networks, search, question answering, and drug discovery. Effective retrieval of (sub-)graphs with relevant substructures has become critical to the success of these applications. This proposed tutorial will introduce attendees to state-of-the-art neural methods for graph retrieval, highlighting architectures that effectively model relevance through innovative combinations of early and late interaction mechanisms. Participants will explore relevance models that represent graphs as sets of embeddings, enabling alignment-driven similarity scoring between query and corpus graphs and supporting diverse cost functions, both symmetric and asymmetric. We will also discuss compatibility with Approximate Nearest Neighbor (ANN) methods, covering recent advances in locality-sensitive hashing (LSH) and other indexing techniques that significantly enhance scalability in graph retrieval. The tutorial includes hands-on experience with an accessible, PyTorch-integrated toolkit that provides downloadable graph retrieval datasets and baseline implementations of recent methods. Participants will learn to adapt these methods for multi-modal applications --- such as molecule, text, and image retrieval --- where graph-based retrieval proves particularly effective. Designed for researchers and practitioners, this session delivers both foundational concepts and practical tools for implementing and scaling neural graph retrieval solutions across interdisciplinary applications.
Neural Shifts in Collaborative Team Recommendation
Team recommendation involves selecting skilled experts to form an almost surely successful collaborative team, or refining the team composition to maintain or excel at performance. To eschew the tedious and error-prone manual process, various computational and social science theoretical approaches have been proposed wherein the problem definition remains essentially the same, while it has been referred to by such other names as team allocation, selection, composition, and formation. In this tutorial, we study the advancement of computational approaches from greedy search in pioneering works to the recent learning-based approaches, with a particular in-depth exploration of graph neural network-based methods as the cutting-edge class, via unifying definitions, formulations, and evaluation schema. More importantly, we then discuss team refinement, a subproblem in team recommendation that involves structural adjustments or expert replacements to enhance team performance in dynamic environments. Finally, we introduce training strategies, benchmarking datasets, and open-source tools, along with future research directions and real-world applications. The tutorial artifacts can be found at https://fani-lab.github.io/OpeNTF/tutorial/cikm25.
Fairness in Language Models: A Tutorial
Language Models (LMs) achieve outstanding performance across diverse applications but often produce biased outcomes, raising concerns about their trustworthy deployment. These concerns call for fairness research specific to LMs; however, most existing work in machine learning assumes access to model internals or training data, conditions that rarely hold in practice. As LMs continue to exert growing societal influence, it becomes increasingly important to understand and address fairness challenges unique to these models. To this end, our tutorial begins by showcasing real-world examples of bias to highlight their practical implications and uncover underlying sources. We then define fairness concepts tailored to LMs, review methods for bias evaluation and mitigation, and present a multi-dimensional taxonomy of benchmark datasets for fairness assessment. We conclude by outlining open research challenges, aiming to provide the community with both conceptual clarity and practical tools for fostering fairness in LMs. All tutorial resources are publicly accessible at https://github.com/vanbanTruong/fairness-in-large-language-models.
Uncertain Boundaries: A Tutorial on Copyright Challenges and Cross-Disciplinary Solutions for Generative AI
As generative artificial intelligence (AI) becomes increasingly prevalent in creative industries, intellectual property issues have come to the forefront, especially regarding AI-generated content that closely resembles human-created works. Recent high-profile incidents involving AI-generated outputs reproducing copyrighted materials underscore the urgent need to reassess current copyright frameworks and establish effective safeguards against infringement. To this end, this tutorial provides a structured overview of copyright challenges in generative AI across the entire development lifecycle. It begins by outlining key copyright principles relevant to generative models, then explores methods for detecting and evaluating potential infringement in generated outputs. The session also introduces strategies to safeguard creative content and training data from unauthorized replication, including mitigation techniques during model training. Finally, it reviews existing regulatory frameworks, highlights unresolved research questions, and offers recommendations to guide future work in this evolving area.
Continual Recommender Systems
Modern recommender systems operate in uniquely dynamic settings: user interests, item pools, and popularity trends shift continuously, and models must adapt in real time without forgetting past preferences. While existing tutorials on continual or lifelong learning cover broad machine learning domains (e.g., vision and graphs), they do not address recommendation-specific demands-such as balancing stability and plasticity per user, handling cold-start items, and optimizing recommendation metrics under streaming feedback. This tutorial aims to make a timely contribution by filling that gap. We begin by reviewing the background and problem settings, followed by a comprehensive overview of existing approaches. We then highlight recent efforts to apply continual learning to practical deployment environments, such as resource-constrained systems and sequential interaction settings. Finally, we discuss open challenges and future research directions. We expect this tutorial to benefit researchers and practitioners in recommender systems, data mining, AI, and information retrieval across academia and industry.
SESSION: Industry Day Talks
Autoregressive Generative Retrieval for Industrial-Scale Recommendations at Pinterest
Generative retrieval methods utilize generative sequential modeling techniques, such as transformers, to generate candidate items for recommender systems. These methods have demonstrated promising results in academic benchmarks, surpassing traditional retrieval models like two-tower architectures. However, current generative retrieval methods lack the scalability required for industrial recommender systems, and they are insufficiently flexible to satisfy the multiple metric requirements of modern systems. This talk introduces PinRec, a novel generative retrieval model developed for applications at Pinterest. PinRec utilizes outcome-conditioned generation, enabling modelers to specify how to balance various outcome metrics, such as the number of saves and clicks, to effectively align with business goals and user exploration. Additionally, PinRec incorporates multi-token generation to enhance output diversity while optimizing generation. Our experiments demonstrate that PinRec can successfully balance performance, diversity, and efficiency, delivering a significant positive impact to users using generative models. This talk presents the first in-depth study on productionizing generative retrieval. Our experiments demonstrate that PinRec can successfully balance performance, diversity, and efficiency, delivering a significant positive impact such as +2% sitewide clicks and +4% search repins. This paper marks a significant milestone in generative retrieval, as it presents, to our knowledge, the first rigorous study on implementing generative retrieval at the scale of Pinterest.
Building Trustworthy Peer Review Quality Assessment Systems
Peer review is foundational to academic publishing, yet the quality of reviews remains difficult to assess at scale due to subjectivity, inconsistency, and the lack of standardized evaluation mechanisms. This talk presents our experience developing and deploying a scalable framework for assessing review quality in operational settings. We combine two complementary approaches: interpretable machine learning models built on quantifiable review- and reviewer-level features, and the application of large language models (LLMs), including Qwen, Phi, and GPT-4o, in zero- and few-shot configurations for textual quality evaluation. We also explore the fine-tuning of LLMs on expert-annotated datasets to examine their upper-bound capabilities. To benchmark these methods, we constructed a dataset of over 700 paper-review pairs labeled by domain experts across multiple quality dimensions. Our findings demonstrate that transparent, feature-based models consistently outperform LLMs in reliability and generalization, particularly when evaluating conceptual depth and argumentative structure. The talk will highlight key engineering choices, deployment challenges, and broader implications for integrating automated review evaluation into scholarly workflows.
Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly Detection
Generative artificial intelligence (GenAI) holds great potentials to automate and enhance data synthesis in nuclear medicine. However, the high-stakes nature of biomedical imaging necessitates robust mechanisms to detect and manage unexpected or erroneous model behavior. We report development and implementation of a hybrid anomaly detection framework to safeguard GenAI models in BIOEMTECH's eyes™ systems. Two applications are demonstrated: Pose2Xray, which generates synthetic X-rays from photographic mouse images, and DosimetrEYE, which estimates 3D radiation dose maps from 2D SPECT/CT scans. In both cases, our outlier detection (OD) enhances reliability, reduces manual oversight, and supports real-time quality control. This approach strengthens the industrial viability of GenAI in preclinical settings by increasing robustness, scalability, and regulatory compliance.
Motion-Based Bird-UAV Classification Using 3D-CNN for Long-Range Anti-UAV Systems
The increasing threat of malicious unmanned aerial vehicles (UAVs) necessitates robust anti-UAV systems. However, their performance is often degraded by bird misclassification caused by low-resolution imagery and unseen UAV types. This study proposes a motion-based 3D convolutional neural network (3D-CNN) trained on image sequences acquired from a radar-camera integrated anti-UAV solution. The proposed method effectively distinguishes UAVs from birds, even under low-resolution conditions and when encountering previously unseen UAV types.
Taming the Unicorn: Turning Generative AI Into a Workhorse How to Draw Boundaries, Handle Hallucinations, and Make AI Behave in Product
This work illustrates the process of turning speculative AI prototypes into reliable, production-grade systems. In particular, we discuss how we redesigned our search engine from keyword-based matching to a generative AI framework that interprets natural language queries and engages users through dynamic question answering, to provide a personalized experience. To build this framework, we created highly cross-functional processes, where content design, user experience research, engineering, and science worked iteratively in unison. In short, we do not chase unicorns; we put them to work safely and predictably.
ROI Scan: LLM-powered Object-level Similarity Search for Google Ads Content Moderation
Google's ad platform reviews a massive volume of ads daily, aiming to maintain user trust and platform integrity. However, malicious advertisers often bypass the existing detection by subtly manipulating suspended ads, typically by placing a policy-violating region of interest (ROI) into diverse images with varied backgrounds and sizes. To counter this, we introduce ROI Scan, a novel two-stage object-level similarity search approach. Stage 1 precisely identifies and extracts the problematic ROIs from escalated ad images using LLM-powered suggestions, refined by human expert review. Stage 2 then matches these ROIs against a massive database of billions of objects extracted from production images. Our experiments on the Online Gambling policy demonstrate ROI Scan's effectiveness, achieving 89.1% relative recall and 83.8% incremental coverage over the full-image search baseline, with nearly 100% precision. In production, ROI Scan prevents hundreds of millions of policy violation impressions and blocks hundreds of thousands of bad ads weekly.
Using Large Language Models to Improve Product Information in E-commerce Catalogs
To give customers good experience, an e-commerce retailer needs high-quality product information in its catalog. Yet, the raw product information often lacks sufficient quality. For a large catalog that can contain billions of products, manually fixing this information is highly labor-intensive. To address this issue, we propose using the tool use functionality of large language models to automatically improve product information. In this talk, we show why existing data cleaning methods are not well suited for this task and how we designed our automated system to improve product information. When evaluated on a random sample of products from an e-commerce catalog, our system improved product information completeness by 78% with no major drop in information accuracy.
Semantic Filter Recommendation for eCommerce Search
We present an application of encoder-only Transformers to the task of recommending filters to eCommerce search queries. In particular, we operate in a dynamic setup where new recommendations are computed online whenever the user selects one or more filters, conditioned on the search query and the filters selected so far. Our method leverages the world knowledge imparted into a pretrained model, setting it apart from purely memory-based or statistical models, which we use as baselines for evaluation. We review experimental results on offline benchmarks using data generated from eBay search logs, comparing the performance of the proposed model to the baselines. The results show a significant increase in filter recommendation accuracy, as measured by NDCG.
LLM-Driven Attributes Extraction in eCommerce
Aspect extraction - the task of identifying attributes such as model, color, or size from textual entities like question-answer pairs - is one of the key tasks in eCommerce. Given the large number of possible aspects per entity, retrieving the most relevant ones is challenging. Traditional methods rely on high-quality labeled data for training, which is costly to obtain at scale. In this work, we propose a training-free aspect extraction approach using LLMs. Leveraging in-context learning and a novel Forward-Backward method that combines retrieval-augmented generation (RAG) with embedding-based matching, our method effectively extracts relevant aspects from text without requiring training data.
Google Ads Content Moderation with RAG
Keeping ad content policy classifiers up to date while maintaining the high quality bar is a significant challenge, especially with new threats emerging constantly. This paper introduces a new application to apply RAG-inspired in-context learning to accelerate content policy enforcement, especially when mitigating new emerging violations. Our application leverages RAG-based LLM inference for classification tasks and incorporates augmented reasoning information for better performance. We also developed a practical framework to enforce new violation patterns in O(1) days demonstrating improved memorization and generalization capabilities compared to traditional parametric and non-parametric models.
TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
Modeling user action sequences has become a popular focus in industrial recommendation system research, particularly for Click-Through Rate (CTR) prediction tasks. However, industry-scale CTR models often rely on short user sequences, limiting their ability to capture long-term behavior. They also rarely address the infrastructure challenges involved in efficiently serving large-scale sequential models. Additionally, these models typically lack an integrated action-prediction task within a point-wise ranking framework, reducing their predictive power. We introduce TransAct V2, a production model for Pinterest's Homefeed ranking system, featuring three key innovations: (1) leveraging very long user sequences to improve CTR predictions, (2) employing scalable, low-latency deployment solutions tailored to handle the computational demands of extended user action sequences, and (3) integrating a Next Action Loss function for enhanced user action forecasting. To overcome latency and storage constraints, we leverage efficient data-processing strategies and model-serving optimizations, enabling seamless industrial-scale deployment. Our approach's effectiveness is further demonstrated through ablation studies. Furthermore, extensive offline and online A/B experiments confirm major gains in key metrics, including engagement volume and recommendation diversity, showcasing TransAct V2's real-world impact.
AutoRuleSQL: Hybrid Text-to-SQL via Rule-Driven Fast Paths and LLM Bootstrapping
Natural Language to SQL (NL2SQL) enables natural language access to structured data, but LLM-based methods can be inefficient for real-time use and repetitive query patterns. We present AutoRuleSQL, a hybrid system that combines template-based fast paths with LLM fallback and offline bootstrapping. Empirical results show that it reduces latency by over 12.6% and improves execution accuracy by up to 4.0%, when combined with existing NL2SQL methods.
Reliable and Efficient Container Orchestration of LLMs via MCP
This paper presents a structured decoding approach to support reliable and efficient container orchestration using large language models (LLMs) in conjunction with the Model Context Protocol (MCP), a standard interface for LLMs to interact with Docker and Kubernetes. We address key challenges in using LLMs for container orchestration: high token overhead from outputs and the risk of generating invalid or unsafe commands. Empirical results demonstrate up to a 76.2% latency reduction.
SESSION: Workshops
DESERE: The 2nd Workshop on Decentralized Search and Recommendation
The growing demand for data ownership and privacy is reshaping how information is accessed, managed, integrated, and recommended. Building on the inaugural DESERE workshop at The Web Conference 2024, this second edition advances research on Decentralised Search and Recommendation platforms such as Personal Online Datastores (PODs), where users retain control of their data and explicitly manage permissions. As ecosystems decentralise, traditional information retrieval must be revisited while standards for new techniques and system designs are developed to ensure efficient, accurate, and privacy-preserving search. The Second DESERE workshop at CIKM 2025 focuses on infrastructures and retrieval algorithms for user-controlled data. It convenes a cross-disciplinary community spanning data retrieval, management and integration, semantic technologies, recommendation systems, privacy-aware computing, and search efficiency to explore approaches that prioritize user agency, data ownership, and scalable retrieval across PODs and related architectures. Through paper presentations, panels, and interactive sessions, the workshop will highlight challenges, opportunities, and solutions for privacy-preserving IR. These discussions are especially relevant to domains where user-centric design and data stewardship are critical-such as personal finance, education, and high-stakes areas like criminal justice and health.
International Workshop on Multimodal Generative Search and Recommendation (MMGenSR@CIKM 2025)
Recent breakthroughs in generative Artificial Intelligence (AI) have ignited a revolutionary wave across information retrieval and recommender systems. This workshop serves as a premier interdisciplinary platform to explore how generative models, particularly Large Language Models (LLMs) and Large Multimodal Models (LMMs), are transforming multimodal search and recommendation paradigms [3, 6, 9, 10, 12-14]. We aim to convene researchers and practitioners to discuss innovative architectures, methodologies, and evaluation strategies spanning generative document retrieval [5, 8] generative image retrieval [ 7, 16], grounded answer generation [17], generative recommendation [2, 4, 11], and related tasks involving multiple modalities [1,15]. The workshop will facilitate discussions on improving algorithms, generating personalized content, evolving user-system interactions, enhancing trustworthiness, and refining evaluation methodologies for these cutting-edge systems. This timely workshop seeks to identify promising research directions, address key challenges, and foster collaborations towards the development of next-generation intelligent systems.
ProActLLM: Proactive Conversational Information Seeking with Large Language Models
Large Language Models (LLMs) have transformed information access by enabling human-like text understanding and generation. This workshop explores the next step for conversational AI: building proactive information-seeking assistants that go beyond reactive question answering. We aim to investigate how LLMs can anticipate user needs, model complex context, support mixed-initiative interactions, integrate retrieval and external tools, personalize responses, adapt through feedback, and ensure fairness, transparency, and cognitive grounding. Bringing together experts from NLP, IR, HCI, and cognitive science, the workshop will serve as a timely forum for advancing intelligent, proactive dialogue systems. It will also foster interdisciplinary collaboration.
Human-Centric AI: From Explainability and Trustworthiness to Actionable Ethics
To address the potential risks of AI while supporting innovation and ensuring responsible adoption, there is an urgent need for clear governance frameworks grounded in human-centric values. It is imperative that AI systems operate in ways that are transparent, trustworthy, and ethically sound. Developing truly human-centric AI goes beyond technical innovation. It requires interdisciplinary collaboration and diverse perspectives. This workshop will explore key challenges and emerging solutions in the development of human-centric AI, with a focus on explainability, trustworthiness, fairness, and privacy. We welcome both theoretical contributions and practical case studies that demonstrate how human-centered principles are realized in real-world AI systems. The official workshop webpage is available at https://xai.kaist.ac.kr/Workshop/hcai2025/, which provides comprehensive information about the program.
Advances in Medical Knowledge Systems: LLMs, RAG and Foundation Models
This workshop will explore the latest approaches to medical knowledge systems, with a focus on the synergy between large language models, retrieval-augmented generation, and foundation/agentic models. The workshop will promote interdisciplinary collaboration among researchers, practitioners, and clinicians to advance evidence-driven AI in healthcare. Topics will include knowledge-grounded question answering, biomedical document retrieval, multimodal clinical reasoning, personalization, safety, and the challenges of deploying AI in practice. With a strong emphasis on reproducibility, evaluation, and responsible application in clinical settings, the workshop will define the next frontier of knowledge-centric AI in medicine.
SIoTEc 2025 - 6th edition of ACM Workshop on Secure IoT, Edge and Cloud systems
In the last years, we have seen an increase in the number of Artificial Intelligence (AI)-powered applications for information retrieval and data science. This fact led to an increasing reliance on distributed computing infrastructures, including Cloud, Edge, and IoT environments. These architectures enable powerful and scalable solutions but also introduce new security and privacy risks that must be addressed at both the system and data levels. Even a single breach on any of the links of the data-service-infrastructure chain may seriously compromise the security of the end-user application. With such a wide attack surface, security must definitely be approached in a holistic way and addressed in any layer where concerns may potentially arise. SIoTEC solicits novel and innovative ideas, proposals, positions and best practices that address the modelling, design, implementation, and enforcement of security in Cloud/Edge/IoT environments. Workshop website: https://siotec.netsons.org/
The 1st International Workshop on Retrieval-driven Generative AI & ScienceON AI Challenge: RDGENAI 2025
Retrieval augmented generation (RAG) has rapidly emerged as a cornerstone for building trustworthy and efficient generative AI systems, spanning from unimodal question answering to complex multimodal reasoning. The 1st international workshop on retrieval-driven generative AI gathers researchers and practitioners focused on the applied side of RAG-particularly for visually rich document understanding (VRDU) where text, layout, and images intertwine. New in 2025, this workshop is co located with the ScienceON AI Challenge, an open competition benchmarking the reliability of AI generated summaries over ScienceON's public OpenAPI search results. By combining a research workshop with a hands on challenge, we provide a complete pipeline from algorithms to real world evaluation. The half day event features invited keynotes, peer reviewed papers, challenge finalist talks, and a poster/demo session, catalyzing collaboration toward robust, explainable, and domain adapted generativeAI.
Advances in Financial AI: Innovations, Risk, and Responsibility in the Era of LLMs
The finance sector is seeing a rapid increase in the application of machine learning and AI, with Large Language Models (LLMs), ESG (Environmental, Social, and Governance) investing, and AI Safety significantly reshaping the field. This workshop focuses on how these advancements intersect with core financial AI applications. We will foster interdisciplinary discussion on applying LLMs to finance, addressing challenges in multilingual and non-English markets like Korea. The event will also highlight the integration of ESG signals into algorithmic decision-making and explore AI Safety, emphasizing reliability, fairness, and explainability for AI systems in regulated financial environments. By bringing together experts from academia, industry, and regulatory bodies, the workshop aims to stimulate discussions on practical issues, ethical dilemmas, and cutting-edge research shaping financial AI's future. We welcome submissions that combine technical rigor with societal relevance in AI-driven financial decisions.
SmaLLEXT: 1st Workshop on Small and Efficient Large Language Models for Knowledge Extraction
The SmaLLEXT workshop (Small and Efficient LLMs for Knowledge Extraction) brings together researchers and practitioners working on compact models that can run under tight memory- and latency budgets while still delivering state-of-the-art accuracy in extracting structured knowledge from unstructured sources. Today's production pipelines-in finance, healthcare, legal, and web-scale analytics-need extraction systems that are fast, verifiable, and economical; current 10-B+-parameter models remain out of reach for many such settings. This workshop focuses on techniques for building smaller, faster, and more adaptable LLMs tailored to knowledge extraction (KE) tasks. Core topics include model compression, knowledge distillation, efficient fine-tuning, optimized retrieval-augmented generation (RAG), and hybrid symbolic-neural approaches. Special emphasis will be placed on practical challenges, such as reducing latency, ensuring factual consistency, and improving robustness in noisy or low-resource settings. By convening a diverse group of experts, the workshop aims to deepen the community's understanding of how lightweight LLMs can be effectively applied to extract, structure, and reason over knowledge at scale. Participants will gain exposure to recent breakthroughs, real-world deployments, and emerging research directions, fostering collaboration around the building of deployable high-impact KE systems in both academic and industrial contexts.
The International Workshop on Spatio-Temporal Data Intelligence and Foundation Models
Spatio-temporal data intelligence, which includes sensing, managing, and mining large-scale data across space and time, plays a pivotal role in understanding complex systems in real-world applications, such as urban computing and smart cities. With the rapid evolution of foundation models and their growing potential to transform spatio-temporal analytics, we propose a comprehensive half-day workshop (with at least 5 accepted papers, 3 keynote talks, 1 panel discussion, and over 50 attendees) at CIKM 2025, catering to professionals, researchers, and practitioners who are interested in spatio-temporal data intelligence and foundation models to address real-world challenges. The workshop will not only offer a platform for knowledge exchange but also acknowledge outstanding contributions through a distinguished Best Paper Award. A dedicated panel discussion will explore recent advances, emerging trends, and open challenges in integrating spatio-temporal data and emerging machine learning techniques, fostering dialogue between academia and industry. Note that this will be the eleventh time that our core members have organized a similar workshop. The previous 10 workshops were hosted in top-tier data mining and management venues, e.g., SIGKDD, WWW, and IJCAI, each of which attracted over 60 participants and 25 submissions on average.
Recommender Systems for Sustainable Development through Responsible Nudging
Recommender Systems (RS) influence everyday decisions, yet most remain optimized for short-term engagement or commercial gain. RS4SD aims to shift this focus by exploring how RS can contribute to sustainable development through behavioral change and nudging strategies. Aligned with the UN Sustainable Development Goals (SDG), RS4SD will highlight applications that promote responsible consumption, sustainable mobility, healthy eating, and digital well-being. In particular, we will focus on how AI and RS can be designed to foster sustainable behaviors through multi-objective optimization and ethically aligned interventions. These objectives are directly tied to the UN SDG, and we welcome all contributions showcasing RS in support of these goals. A central theme of the workshop is the integration of behavioral science and AI to design interventions that guide users toward more sustainable and healthier choices while preserving individual autonomy. Topics of interest include multi-objective recommendation, health-aware RS, eco-friendly product and tourism RS, as well as novel evaluation metrics that go beyond accuracy to capture societal impact. RS4SD will bring together researchers, stakeholders and practitioners from RS, AI, sustainability, and behavioral science to share models, datasets, frameworks, and real-world use cases. The workshop encourages interdisciplinary collaboration and aims to build a community dedicated to responsible, behavior-aware RS that benefit both individuals and society.
Frontiers in Graph Machine Learning for the Large Model Era
The ''Frontiers in Graph Machine Learning for the Large Model Era (GMLLM'25)'' workshop focuses on advancing graph machine learning (GML) techniques in the context of increasingly large and powerful models. Graphs offer a principled way to represent structured and relational data, making them essential for capturing complex dependencies in knowledge, systems, and behaviors. As the scale and influence of foundation models grow, graph learning stands at a unique vantage point to enhance model robustness, improve interpretability, and integrate domain-specific relational priors. This workshop explores how graph learning can support emerging needs in knowledge reasoning, temporal and multi-hop inference, and AI systems. It also investigates how advances in representation learning, structure-aware generalization, and efficient graph processing can contribute to trustworthy and scalable AI systems. By convening experts in graph learning, knowledge management, and LLMs, the workshop aims to identify core challenges and opportunities of GML in the large model era.
Trustworthy Knowledge Discovery and Data Mining (TrustKDD)
The explosion of data and the widespread adoption of AI techniques, especially the success of foundation models and generative AI, have transformed knowledge discovery and data mining (KDD), making them integral to real-world decision-making. For both traditional AI methods and generative AI, issues such as data noise, algorithmic bias, lack of interpretability, and privacy concerns can significantly impact the quality and reliability of extracted knowledge, thereby affecting downstream decision-making. This workshop aims to bring together researchers and practitioners from information and knowledge management, data mining, and intelligent systems to explore trustworthy KDD across diverse settings in the generative AI era. We welcome contributions on robust data preprocessing, explainable learning algorithms, bias detection and mitigation, secure and privacy-preserving mining, trustworthy knowledge graph construction, resource-efficient deployment, alignment of foundation models, and applications for social good. Special emphasis is placed on emerging challenges posed by large-scale, pre-trained models in dynamic, multi-source, and user-centric environments. By fostering dialogue between traditional KDD approaches and innovations in the foundation model era, TrustKDD seeks to advance trustworthy methodologies that align with CIKM's mission of developing reliable, scalable, and intelligent information and knowledge systems.
The 1st Workshop on LLM Agents for Social Simulation
Social simulation has long played a crucial role in exploring the mechanisms underlying human behavior and societal structures. Traditional social simulation relies on rule-based or statistical models, which makes it difficult to capture the complexity and variability of the real world. With the emergence and rapid development of large language model (LLM), new frontiers have been opened toward leveraging LLMs as agent to model human behavior and interactions. This cutting-edge direction has gained significant attention and demonstrated promising results, not only advancing research across a wide range of social science disciplines, but also enabling practical applications in role-playing scenarios. However, this field still faces multiple challenges, such as capturing real-world social phenomena, eliminating bias or ethical considerations, and ensuring usability and reliability. This workshop on LLM Agent for Social Simulation (LASS) aims to bring together researchers and practitioners from diverse backgrounds to foster interdisciplinary collaboration, address key challenges, explore new technologies, and chart promising future directions in this rapidly evolving field.