CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management
Full Citation in the ACM Digital Library
SESSION: Keynote Talks
The Geometry of Knowledge and Computational Discovery
Modern neural networks transform vast datasets into continuous embedding spaces, translating
semantic relationships into geometric structures. The key to unlocking their full
potential lies in making these representations interpretable. This talk presents a
simple theory underlining the interpretability of the embedding space and how this
principle can allow us to analyze the data, design new metrics, and model the dynamics
of the system, moving beyond black-box models to data-driven interpretable insights.
AI Planning for Data Exploration
Data Exploration is an incremental process that helps users ex- press what they want
through a conversation with the data. A large body of work focused on automating data
exploration (e.g., to ex- plore very large galaxy data in SDSS [6, 7], to summarize
large datasets [8, 9], or to explore ratings [2, 3] and search for prod- ucts [5]).
Reinforcement Learning (RL) is one of the most notable approaches to automate data
exploration and several solutions have been proposed. With the advent of Large Language
Models and their ability to reason sequentially, it has become legitimate to ask the
question: would LLMs and, more generally AI planning, outperform a customized RL policy
in data exploration [1]? More specifically, would LLMs help circumvent retraining
for new tasks and strike a balance between specificity and generality [4]? This talk
will attempt to answer this question by reviewing RL training and policy reusability
for data exploration. This talk will start with an overview of exploratory data analysis
and the various uses of RL to automate this online decision-making process. Then I
will introduce AI planning and the need for policy reusability in RL. The last part
of the talk will discuss pressing questions on AI planning applied to data exploration,
including memory management, evaluation, and responsible deployment.
Numerical Linear Algebraic Foundations for Large-Scale Unsupervised Learning
Numerical Linear Algebra provides essential foundations in many large-scale data analytic
tasks. In this talk, it is illustrated that some of the powerful methods especially
for unsupervised tasks such as clustering, topic modeling, community detection, embedding,
and representation learning can be derived based on a framework of low rank approximation
(LRA). These include the ubiquitous singular value decomposition (SVD), latent semantic
indexing (LSI), principal component analysis (PCA), and the constrained LRA (CLRA)-based
methods such as nonnegative matrix factorization (NMF) and its variants such as Symmetric
NMF (SymNMF), and JointNMF. It is shown that all these methods can be explained using
one framework which can then be further generalized into more advanced methods such
as co-clustering and co-embedding for more complex situations including multi-view
and multi-granularity data sets, and into semi-supervised methods incorporating prior
knowledge. The presented algorithms that utilize advances in numerical linear algebra
are shown to achieve scalability, efficiency, and effectiveness. Substantial experimental
results on synthetic and real-life problems illustrate significant benefits of exploiting
numerical linear algebra-based methods in many data analytic tasks.
SESSION: Full Research Papers
ADMP-GNN: Adaptive Depth Message Passing GNN
- Yassine Abbahaddou
- Fragkiskos D. Malliaros
- Johannes F. Lutzeyer
- Michalis Vazirgiannis
Graph Neural Networks (GNNs) have proven to be highly effective in various graph learning
tasks. A key characteristic of GNNs is their use of a fixed number of message-passing
steps for all nodes in the graph, regardless of each node's diverse computational
needs and characteristics. Through empirical real-world data analysis, we demonstrate
that the optimal number of message-passing layers varies for nodes with different
characteristics. This finding is further supported by experiments conducted on synthetic
datasets. To address this, we propose Adaptive Depth Message Passing GNN (ADMP-GNN), a novel framework that dynamically adjusts the number of message passing layers for
each node, resulting in improved performance. This approach applies to any model that
follows the message passing scheme. We evaluate ADMP-GNN on the node classification
task and observe performance improvements over baseline GNN models. Our code is publicly
available at: https://github.com/abbahaddou/ADMP-GNN
EvenOddML: Even and Odd Aggregation with Multi-Level Contrastive Learning for Bipartite
Graph
- Manasvi Aggarwal
- Jahnavi Methukumalli
- Deepanshu Bagotia
- Suhas Powar
Bipartite graphs, which model relationships between two distinct entity types, are
common in various applications. Existing bipartite graph neural networks (GNNs) often
fail to capture both local and global structures and inadequately consider the indirect
and direct influences from same-type nodes, leading to suboptimal performance. To
address these issues, we propose the EvenOddML model, a contrastive learning based
node representation learning module for bipartite graphs. The model comprises an Even-Odd
encoder (an even-N-odd aggregation module) that aggregates information from immediate
neighbors as well as 2-hop neighbors, both directly and indirectly. We also introduce
a novel three-level contrastive learning framework (Layer Level, Type-Global Level,
and Network- Global Level) that hierarchically maximizes mutual information by integrating
local and global information at various scales. We evaluate EvenOddML on recommendation
and link prediction tasks, showing its effectiveness over state-of-the-art methods
in bipartite graph representation learning.
Quantization Aware Matryoshka Adaptation: Leveraging Matryoshka Learning, Quantization,
and Bitwise Operations for Reduced Storage and Improved Retrieval Speed
We introduce Quantization Aware Matryoshka Adaptation (QAMA), a unified framework for creating compact yet semantically rich embeddings through
Matryoshka Representation Learning and multi-level quantization. Our approach learns nested embeddings that gracefully
shrink to smaller dimensional subsets and leverages bitwise operations (XOR, NOT,
POPCOUNT) for efficient retrieval. By augmenting transformer-based encoders with lightweight
feedforward layers and specialized regularization (Matryoshka Loss, Orthogonality,
Information Bottleneck, and Quantization Loss), we produce quantization-friendly representations
that preserve essential information in early dimensions.
We explore 0.5-bit, 1-bit, 1.5-bit, and 2-bit quantization levels, as well as a Hybrid
Quantization scheme that adaptively allocates higher precision to dimensions with
greater information content. Across extensive evaluations on Modern BERT (MB) and
MiniLM models, 2-bit quantization and Hybrid Quant consistently recover up to 95--98% of the original full-precision (FP32) performance
while reducing memory usage by over 90%. Even at low embedding dimensions (e.g., 96--192),
QAMA's hierarchical training ensures that performance remains surprisingly robust,
highlighting the effectiveness of our bit-level expansions and nested representation
learning.
Our proposed end-to-end training with quantization-aware loss yields embeddings that
map cleanly into discrete levels, supporting rapid Hamming distance calculations for
semantic similarity search. % using bitwise operations. Our framework QAMA offers a practical means to optimize embedding storage and retrieval speed for information
retrieval systems.
Efficient Knowledge Transfer from Large to Small Language Models via Low-Overhead
Query Mechanism
Small language models offer computational efficiency but often lack the performance
of larger models. We introduce a novel query mechanism enabling small models to efficiently
extract knowledge from large models during inference. Our approach executes the large
model on a single vector prompt, significantly reducing computational overhead compared
to full model execution.
Using a 1B-parameter Llama 3.2 as the small model and 3B/8B Llama models as knowledge
sources, we evaluate on 20 diverse benchmarks spanning reasoning, factual recall,
and reading comprehension tasks. Our best method achieves substantial improvements
over the small model baseline, with particularly strong gains on factual memory tasks
(average +114.9% relative improvement). Notable results include improvements on TriviaQA
(74.4% vs 35.4% baseline), Freebase Questions (42.5% vs 14.6%), and Natural Questions
(34.9% vs 12.6%). Our approach consistently outperforms traditional fine-tuning methods
while maintaining efficiency, achieving improvements of over 41% across multiple tasks
while incurring only 31% additional compute over the 1B baseline.
Mixed data k-Anonymization by Consistent Maximal Association and Microaggregation
- Julien Ah-Pine
- Nathaniel Gbenro
This paper addresses the challenge of anonymizing mixed data, comprising both categorical
(qualitative) and numerical (continuous) variables, while preserving data utility.
The inherent heterogeneity of such data complicates the use of traditional anonymization
methods. To overcome this limitation, we propose a novel microaggregation-based framework
for k-anonymization that integrates statistical association measures applicable to both
variable types, ensuring a coherent and consistent treatment. Our approach, called
Mix-R 2, relies on a unified set of core concepts grounded in analysis of variance, enabling
the application of a common methodology to both categorical and numerical attributes.
By leveraging these consistent association measures, the framework improves the robustness
of the k-anonymization process, delivering strong privacy protection while maintaining high
data utility. Numerical experiments on benchmark datasets demonstrate the effectiveness
and advantages of our method, highlighting its contribution to privacy-preserving
analysis of mixed-type data.
LLM-Powered Information Extraction for the Dairy Financial Domain: Tackling Data Scarcity
and Ambiguity
- Chunyan An
- Yuying Huang
- Qiang Yang
- Siyu Yuan
- Zhixu Li
Information extraction is a critical technology for intelligent analysis and risk
assessment in the dairy financial domain. However, real-world applications face three
major challenges: the complexity and diversity of entity-relation types, significant
data imbalance, and ambiguity in textual expressions. Traditional methods often fail
to capture rare patterns, struggle with vague mentions, and exhibit poor generalization
in low-resource settings. To address these issues, we propose a novel framework that
integrates large language models (LLMs) with targeted data augmentation and agent-based
retrieval-augmented generation (RAG). Our approach builds on the BaiChuan2 model,
which is first adapted to the dairy finance domain via secondary pretraining. We introduce
a two-stage data augmentation strategy: the first stage uses ChatGPT to generate pseudo-samples
for rare types, and the second stage refines model weaknesses based on prediction-guided
feedback. These augmented datasets are used to fine-tune the model through prompt-based
supervised learning with LoRA. To further enhance robustness, we incorporate an agent-based
RAG module for completing vague or underspecified entities by retrieving external
contextual knowledge. Extensive experiments demonstrate that our framework achieves
state-of-the-art performance, with the improved metric, i.e., F1+ scores, of 0.876 and 0.824 for entity recognition and relation extraction, respectively.
The RAG component boosts entity completion accuracy to 0.802 while reducing retrieval
latency by over 6x, showcasing both the effectiveness and practicality of our method
in real-world dairy financial applications.
RELINK: Edge Activation for Closed Network Influence Maximization via Deep Reinforcement
Learning
- Shivvrat Arya
- Smita Ghosh
- Bryan Maruyama
- Venkatesh Srinivasan
Influence Maximization aims to select a subset of elements in a social network to
maximize information spread under a diffusion model. While existing work primarily
focuses on selecting influential nodes, these approaches assume unrestricted message
propagation-an assumption that fails in closed social networks, where content visibility
is constrained and node-level activations may be infeasible. Motivated by the growing
adoption of privacy-focused platforms such as Signal, Discord, Instagram, and Slack,
our work addresses the following fundamental question: How can we learn effective edge activation strategies for influence maximization in
closed networks? To answer this question we introduce Reinforcement Learning for Link Activation (RELINK), the first DRL framework for edge-level influence maximization
in privacy-constrained networks. It models edge selection as a Markov Decision Process,
where the agent learns to activate edges under budget constraints. Unlike prior node-based
DRL methods, RELINK uses an edge-centric Q-learning approach that accounts for structural
constraints and constrained information propagation. Our framework combines a rich
node embedding pipeline with an edge-aware aggregation module. The agent is trained
using an n-step Double DQN objective, guided by dense reward signals that capture
marginal gains in influence spread. Extensive experiments on real-world networks show
that RELINK consistently outperforms existing edge-based methods, achieving up to
15% higher influence spread and improved scalability across diverse settings.
Learning Global-Local Multi-Scale Node Embeddings with Random Walks and Landmark-Guided
Optimization
- Ali Assi
- Nour Elislem Karabadji
- Mohamed Elati
- Wajdi Dhifli
Learning low-dimensional representations for nodes is fundamental to various graph
learning tasks. Existing approaches often struggle to comprehensively capture the
connections between local structure, contextual information, and global node positioning
in the graph. This paper introduces GLOW, a novel approach for node representation
learning that jointly maximizes the likelihood of preserving these crucial aspects
of the graph structure. GLOW integrates 1) local structural and contextual information,
captured through random walks, with 2) global node positioning, approximated using
a set of selected landmark nodes. This strategy allows GLOW to learn node embeddings
that are structurally aware and informative for downstream tasks. We evaluate GLOW's
performance on several benchmark datasets for node classification and link prediction,
and compare it to a wide range of established baseline methods. The obtained results
consistently demonstrate that GLOW achieves substantial performance improvements.
This highlights the advantages of integrating local and global structural learning
in effectively capturing graph structure and generating highly resilient node embeddings.
Dynamic Triangulation-Based Graph Rewiring for Graph Neural Networks
- Hugo Attali
- Thomas Papastergiou
- Nathalie Pernelle
- Fragkiskos D. Malliaros
Graph Neural Networks (GNNs) have emerged as the leading paradigm for learning over
graph-structured data. However, their performance is limited by issues inherent to
graph topology, most notably oversquashing and oversmoothing. Recent advances in graph
rewiring aim to mitigate these limitations by modifying the graph topology to promote
more effective information propagation. In this work, we introduce TRIGON, a novel
framework that constructs enriched, non-planar triangulations by learning to select
relevant triangles from multiple graph views. By jointly optimizing triangle selection
and downstream classification performance, our method produces a rewired graph with
markedly improved structural properties such as reduced diameter, increased spectral
gap, and lower effective resistance compared to existing rewiring methods. Empirical
results demonstrate that TRIGON outperforms state-of-the-art approaches on node classification
tasks across a range of homophilic and heterophilic benchmarks.
Extreme Multi-Label Completion for Semantic Document Tagging with Taxonomy-Aware Parallel
Learning
- Julien Audiffren
- Christophe Broillet
- Ljiljana Dolamic
- Philippe Cudré-Mauroux
The objective of Extreme Multi-Label Completion (XMLCo) is to predict missing document
labels drawn from a very large collection. Together with Extreme Multi-Label Classification
(XMLC), XMLCo is arguably one of the most challenging document classification tasks,
as the number of potential labels is generally very large compared to the number of
labeled documents. The collection of labels is often structured in a taxonomy that
encodes relationships between labels, and many methods have been proposed to leverage
this hierarchy to improve XMLCo algorithms. In this paper, we propose a new approach
to this problem: TAMLEC (Taxonomy-Aware Multi-task Learning for Extreme multi-label
Completion). TAMLEC divides the problem into several Taxonomy-Aware Tasks, i.e. into
specific subsets of the labels drawn from paths in the taxonomy, and trains on these
tasks using a dynamic Parallel Feature sharing approach where parts of the model are
shared between tasks while others are task-specific. Then, at inference time, TAMLEC
uses the labels available in a document to predict missing labels, using the Weak-Semilattice
structure that is naturally induced by the tasks. Our empirical evaluation on real-world
datasets shows that TAMLEC substantially outperforms the state-of-the-art in XMLCo.
Furthermore, additional experiments show that TAMLEC is particularly suited for few-shot
settings, where new tasks or labels are introduced with only few examples after initial
training.
Relation-Faceted Graph Pooling with LLM Guidance for Dynamic Span-Aware Information
Extraction
- Hye-Yoon Baek
- Jinho Choi
- Jimyeung Seo
- Xiongnan Jin
- Dongcheon Lee
- Byungkook Oh
Joint information extraction aims to convert unstructured text into structured knowledge
by identifying entities and their relations. However, existing methods often rely
on static span formation and relation-agnostic validation, limiting their ability
to capture dynamic, context-sensitive semantics. We present RePooL, a hierarchical
validation framework that performs fine-grained token-level filtering followed by
coarse-grained span-level validation, enabling robust multi-granular semantic modeling.
RePooL constructs a dual-view knowledge graph that models tokens and relations as
distinct node types. It leverages auxiliary structural relations to encode token-relation
semantic compatibility via subject and object roles and to compose multi-token spans
dynamically, thereby enabling relation-aware validation across multiple granularities.
To further strengthen semantic grounding, RePooL incorporates LLM-guided alignment,
which evaluates candidate triples against the input text to specifically reinforce
coherent extractions. Extensive experiments on standard IE benchmarks show that RePooL
achieves superior performance, demonstrating its effectiveness in modeling fine-grained
entity-relation interactions.
MUFFIN: Mixture of User-Adaptive Frequency Filtering for Sequential Recommendation
- Ilwoong Baek
- Mincheol Yoon
- Seongmin Park
- Jongwuk Lee
Sequential recommendation (SR) aims to predict users' subsequent interactions by modeling
their sequential behaviors. Recent studies have explored frequency domain analysis, which effectively models periodic patterns in user sequences. However, existing frequency-domain
SR models still face two major drawbacks: (i) limited frequency band coverage, often missing critical behavioral patterns in a specific frequency range, and (ii)
lack of personalized frequency filtering, as they apply an identical filter for all users regardless of their distinct frequency
characteristics. To address these challenges, we propose a novel frequency-domain
model, Mixture of User-adaptive Frequency FIlteriNg (MUFFIN ), operating through two complementary modules. (i) The global filtering module (GFM) handles the entire frequency spectrum to capture comprehensive behavioral patterns.
(ii) The local filtering module (LFM) selectively emphasizes important frequency bands without excluding information from
other ranges. (iii) In both modules, the user-adaptive filter (UAF) is adopted to generate user-specific frequency filters tailored to individual unique
characteristics. Finally, by aggregating both modules, MUFFIN captures diverse user
behavioral patterns across the full frequency spectrum. Extensive experiments show
that MUFFIN consistently outperforms state-of-the-art frequency-domain SR models over
five benchmark datasets. The source code is available at https://github.com/ilwoong100/MUFFIN.
Subclass-Aware Inclusive Classifier via Repulsive Hidden Strata
- Namita Bajpai
- Jiaul H Paik
- Sudeshna Sarkar
Classification models in machine learning are typically trained using coarse-grained
class labels. Although these models often achieve strong overall accuracy, their performance
is asymmetric across the subclasses that arise out of a common phenomenon called hidden
stratification. Generally, the latent subclasses within each class differ substantially
in distribution and characteristics, resulting in poor generalization for underrepresented
groups. Moreover, imbalanced subclass distributions lead to majority subclasses dominating
training, resulting in biased and less reliable models, especially for safety-critical
applications (such as medical). To address these challenges, we propose a novel framework
that attempts to uncover hidden subclasses via a repulsive point process. Our approach
then leverages these fine-grained labels to make the classifier more inclusive across
the subclasses. Our approach identifies subclasses without requiring additional supervision,
thereby promoting diversity and reducing sensitivity to subclass imbalance. Extensive
experiments on four benchmark datasets demonstrate consistent and significant improvements
over state-of-the-art baselines across both balanced and imbalanced subclass distributions,
underscoring the effectiveness and generalizability of our approach.
A Robust Clustered Federated Learning Approach for Non-IID Data with Quantity Skew
- Michael Ben Ali
- Imen Megdiche
- André Péninou
- Olivier Teste
Federated Learning (FL) is a decentralized paradigm that enables a client-server architecture
to collaboratively train a global Artificial Intelligence model without sharing raw
data, thereby preserving privacy. A key challenge in FL is Non-IID data. Quantity
Skew (QS) is a particular problem of Non-IID, where clients hold highly heterogeneous
data volumes. Clustered Federated Learning (CFL) is an emergent variant of FL that
presents a promising solution to Non-IID problem. It improves models' performance
by grouping clients with similar data distributions into clusters. CFL methods generally
fall into two operating strategies. In the first strategy, clients select the cluster
that minimizes the local training loss. In the second strategy, the server groups
clients based on local model similarities. However, most CFL methods lack systematic
evaluation under QS but present significant challenges because of it.
In this paper, we present two main contributions. The first one is an evaluation of
state-of-the-art CFL algorithms under various Non-IID settings, applying multiple
QS scenarios to assess their robustness. Our second contribution is a novel iterative
CFL algorithm, named CORNFLQS, which proposes an optimal coordination between both
operating strategies of CFL. Our approach is robust against the different variations
of QS settings. We conducted intensive experiments on six image classification datasets,
resulting in 270 Non-IID configurations. The results show that CORNFLQS achieves the
highest average ranking in both accuracy and clustering quality, as well as strong
robustness to QS perturbations. Overall, our approach outperforms actual CFL algorithms.
Large Model Annotation-Enhanced Spatio-Temporal Fusion Knowledge Tracing Model
- Tianyu Cai
- Xiaodi Huang
- Tao Zhou
- Yanting Li
- Shenggen Ju
Knowledge Tracing (KT) aims to model students' evolving knowledge states based on
their interactions, supporting downstream applications such as personalized resource
recommendation. Recent works have leveraged large language models (LLMs) to annotate
spatial topological structures -- e.g., prerequisite and hierarchical relations-between
knowledge concepts, significantly reducing manual annotation costs. However, two major
challenges remain: (1) low interpretability and potential unreliability due to LLM-induced
annotation errors or hallucinations, and (2) limited cross-hierarchical interactions
that hinder model performance. To address these issues, we propose a Large Model Annotation-Enhanced
Spatio-Temporal Fusion KT model. First, we introduce a Detection-Reannotation Strategy
to mitigate LLM annotation errors and hallucinations, resulting in a more accurate
Knowledge Concept (KC) relation graph. Second, we present a Unit Relation Graph Annotation
Method to reduce the distance between cross-hierarchical nodes, thereby enhancing
their interactions. Lastly, we propose a Spatio-Temporal Fusion Framework, incorporating
a dual-view contrastive learning module and a graph-structured knowledge state propagation
module, to more effectively model students' knowledge propagation. Experiments on
three real-world educational datasets demonstrate that our method effectively improves
the practicality and reliability of LLM annotations, achieving state-of-the-art performance.
Moreover, the knowledge propagation process based on the annotated graph enhances
interpretability for educational applications.
ProtoEHR: Hierarchical Prototype Learning for EHR-based Healthcare Predictions
- Zi Cai
- Yu Liu
- Zhiyao Luo
- Tingting Zhu
Digital healthcare systems have enabled the collection of mass healthcare data in
electronic healthcare records (EHRs), allowing artificial intelligence solutions for
various healthcare prediction tasks. However, existing studies often focus on isolated
components of EHR data, limiting their predictive performance and interpretability.
To address this gap, we propose ProtoEHR, an interpretable hierarchical prototype
learning framework that fully exploits the rich, multi-level structure of EHR data
to enhance healthcare predictions. More specifically, ProtoEHR models relationships
within and across three hierarchical levels of EHRs: medical codes, hospital visits,
and patients. We first leverage large language models to extract semantic relationships
among medical codes and construct a medical knowledge graph as the knowledge source.
Building on this, we design a hierarchical representation learning framework that
captures contextualized representations across three levels, while incorporating prototype
information within each level to capture intrinsic similarities and improve generalization.
To perform a comprehensive assessment, we evaluate ProtoEHR in two public datasets
on five clinically significant tasks, including prediction of mortality, prediction
of readmission, prediction of length of stay, drug recommendation, and prediction
of phenotype. The results demonstrate the ability of ProtoEHR to make accurate, robust,
and interpretable predictions compared to baselines in the literature. Furthermore,
ProtoEHR offers interpretable insights on code, visit, and patient levels to aid in
healthcare prediction.
Force Matching with Relativistic Constraints: A Physics-Inspired Approach to Stable
and Efficient Generative Modeling
- Yang Cao
- Bo Chen
- Xiaoyu Li
- Yingyu Liang
- Zhizhou Sha
- Zhenmei Shi
- Zhao Song
- Mingda Wan
This paper introduces Force Matching (ForM), a novel framework for generative modeling
that represents an initial exploration into leveraging special relativistic mechanics
to enhance the stability of the sampling process. By incorporating the Lorentz factor,
ForM imposes a velocity constraint, ensuring that sample velocities remain bounded
within a constant limit. This constraint serves as a fundamental mechanism for stabilizing
the generative dynamics, leading to a more robust and controlled sampling process.
We provide a rigorous theoretical analysis demonstrating that the velocity constraint
is preserved throughout the sampling procedure within the ForM framework. To validate
the effectiveness of our approach, we conduct extensive empirical evaluations. On
the half-moons dataset, ForM significantly outperforms baseline methods, achieving the lowest Euclidean
distance loss of 0.714, in contrast to vanilla first-order flow matching (5.853) and first- and second-order
flow matching (5.793). Additionally, we perform an ablation study to further investigate
the impact of our velocity constraint, reaffirming the superiority of ForM in stabilizing
the generative process. The theoretical guarantees and empirical results underscore
the potential of integrating special relativity principles into generative modeling.
Our findings suggest that ForM provides a promising pathway toward achieving stable,
efficient, and flexible generative processes. This work lays the foundation for future
advancements in high-dimensional generative modeling, opening new avenues for the
application of physical principles in machine learning.
Dynamic Graph Learning via Historical Information Perception and Multi-Granular Temporal
Curriculum Learning
- Yuehang Cao
- Xiang Zhao
- Yang Fang
- Yan Pan
- Jiuyang Tang
Dynamic graph representation learning has emerged as a pivotal paradigm for modeling
time-varying relational patterns in complex systems ranging from social networks to
urban mobility. While existing methods achieve notable progress in temporal modeling,
one critical challenge remains insufficiently addressed: identifying dual temporal
evolution, i,e., instantaneous states and evolutionary trajectories. To address the
challenge, we propose HMGNN, a novel dynamic graph learning framework that harmoniously
integrates temporal dynamics modeling with stable structural representation learning,
allowing adaptive pattern discovery while preserving feature consistency in evolving
environments. Firstly, we propose a dynamic model that integrates a historical information
perception module and a temporal aggregation module. The module converts the historical
information into the model and adaptively measures the impact of the instantaneous
and historical information effectively through the aggregation function. Secondly,
we devise a dual-component model learning framework comprising contrastive learning
and multi-granular temporal curriculum learning to holistically capture evolutionary
dynamics. The contrastive learning component employs continuous-view contrastive alignment
to preserve stable node feature across temporal evolution. Complementarily, our multi-granular
temporal curriculum learning introduces masking mechanism to explicitly learn different
time interval evolution patterns. Extensive experiments demonstrate the significant
superiority of HMGNN against state-of-the-art dynamic graph learning methods in terms
of all evaluation metrics.
Enhancing Contrastive Link Prediction With Edge Balancing Augmentation
- Chen-Hao Chang
- Hui-Ju Hung
- Chia-Hsun Lu
- Chih-Ya Shen
Link prediction is one of the most fundamental tasks in graph mining, which motivates
the recent studies of leveraging contrastive learning to enhance the performance.
However, we observe two major weaknesses of these studies: i) the lack of theoretical
analysis for contrastive learning on link prediction, and ii) inadequate consideration
of node degrees in contrastive learning. To address the above weaknesses, we provide
the first formal theoretical analysis for contrastive learning on link prediction,
where our analysis results can generalize to the autoencoder-based link prediction
models with contrastive learning. Motivated by our analysis results, we propose a
new graph augmentation approach, Edge Balancing Augmentation (EBA), which adjusts the node degrees in the graph as the augmentation. We then propose
a new approach, named Contrastive Link Prediction with Edge Balancing Augmentation (CoEBA), that integrates the proposed EBA and the proposed new contrastive losses
to improve the model performance. We conduct experiments on 8 benchmark datasets.
The results demonstrate that our proposed CoEBA significantly outperforms the other
state-of-the-art link prediction models.
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series
Segmentation
- Ching Chang
- Ming-Chih Lo
- Wen-Chih Peng
- Tien-Fu Chen
Multivariate time series data, collected across various fields such as manufacturing
and wearable technology, exhibit states at multiple levels of granularity, from coarse-grained
system behaviors to fine-grained, detailed events. Effectively segmenting and integrating
states across these different granularities is crucial for tasks like predictive maintenance
and performance optimization. However, existing time series segmentation methods face
two key challenges: (1) the inability to handle multiple levels of granularity within
a unified model, and (2) limited adaptability to new, evolving patterns in dynamic
environments. To address these challenges, we propose PromptTSS, a novel framework
for time series segmentation with multi-granularity states. PromptTSS uses a unified
model with a prompting mechanism that leverages label and boundary information to
guide segmentation, capturing both coarse- and fine-grained patterns while adapting
dynamically to unseen patterns. Experiments show PromptTSS improves accuracy by 24.49%
in multi-granularity segmentation, 17.88% in single-granularity segmentation, and
up to 599.24% in transfer learning, demonstrating its adaptability to hierarchical
states and evolving time series dynamics. Our code is available at https://github.com/blacksnail789521/PromptTSS.
Multivariate Wind Power Time Series Forecasting with Noise-Filtering Neural ODEs
- Tianyu Chang
- Dongming Chen
- Dongqi Wang
In wind energy generation, wind power prediction traditionally relies on the simulation
of wind turbine operational data. Recently, many complex deep learning networks have
challenged the traditional paradigm.Multivariate Wind Power Time Series (MWTS) with
strong volatility are often non-uniformly sampled, making it difficult for traditional
time series methods to model. To address these challenges, our approach for MWTS forecasting
includes two modules: (1) The ODE-filter module uses a noise extraction network to
learn the Ordinary Differential Equations (ODE) continuous-time dynamics, which removes
high-frequency noise by treating the noise as a neural flow. The neural flow pushes
the ODE forward in time steps. (2) An adaptive decomposition module applies an unfixed
window driven by gradients to capture trend and seasonal components including more
abrupt change details. Our method models the precise wind power evolution, can naturally
forecast the multivariate wind power time series and reduce the interference from
noise and outliers in the data. Experimental results show that our approach outperforms
existing models on regularly and irregularly sampled MWTS.
Stamp: Semantic-Aware Sub-trajectory Anomaly Detection with Diffusion Multi-model
Pool for Evolving Data Streams
- Biao Chen
- Junhua Fang
- Pingfu Chao
- An Liu
- Pengpeng Zhao
- Lei Zhao
Trajectory anomaly detection, as a fundamental operation for moving object pattern
discovery, plays an irreplaceable and critical role in spatio-temporal location-based
services. Conducting online detection based on the current positions and their contextual
semantics can significantly enhance the value of trajectory data. However, existing
approaches suffer from two fundamental limitations: 1) treat trajectories as indivisible
sequences or apply rigid segmentation strategies, and 2) use of a single detection
model that struggles to adapt to concept drift caused by evolving trajectory distributions.
Such limitations make it impossible to detect abnormal trajectories in a timely and
semantically comprehensive manner. To fill this gap, we propose Stamp, a novel framework
for Semantic-aware sub-Trajectory Anomaly detection with a diffusion Multi-model Pool.
In particular, Stamp comprises three key innovations: 1. It employs a semantic-driven
dynamic segmentation mechanism that identifies natural breakpoints in trajectories
based on changes in road semantics, rather than fixed rules. 2. It enhances trajectory
representation by embedding road network semantic vectors, capturing both spatial
geometry and functional urban characteristics. 3. It employs a pool of diffusion models
that dynamically evolves through reliability assessment, similarity measurement, and
strategic merging operations, ensuring adaptability to concept drift while leveraging
the superior generative capabilities of diffusion models over traditional autoencoders.
Experimental results demonstrate that Stamp improves detection efficiency by 35%,
AUPR by 5.6%, and F1-score by 2.7% on two large-scale real-world urban trajectory datasets when compared
to state-of-the-art methods, demonstrating its effectiveness for real-time anomaly
detection in complex urban environments.
Spreader Behavior Forecasting: Intent-aware Neural Processes for Intervening Misinformation
The behavior of spreaders on social media evolves continuously, driven by shifting
intentions and interactions with emerging news topics. Traditional approaches have
focused on identifying misinformation spreaders, but have often relied on a static
ground-truth label, limiting their applicability for implementing time-sensitive platform
interventions. In contrast, our work tackles spreader behavior forecasting through
an account-level credit score, modeling the temporal evolution of spreader behavior
to capture the intent shifts that drive misinformation spreading. To this end, we
propose a novel Intent-aware Neural Processes (INP) model, which focuses on tracking
the evolving intent of spreaders over time. The model leverages a state transition
structure and an intent state thinning algorithm to improve latent representations,
enabling more accurate predictions of future spreader behavior. Experimental results
on restructured datasets demonstrate the effectiveness of INP in identifying temporal
risk regions for proactive misinformation intervention.
Structure-prior Informed Diffusion Model for Graph Source Localization with Limited
Data
- Hongyi Chen
- Jingtao Ding
- Xiaojun Liang
- Yong Li
- Xiao-Ping Zhang
Source localization in graph information propagation is essential for mitigating network
disruptions, including misinformation spread, cyber threats, and infrastructure failures.
Existing deep generative approaches face significant challenges in real-world applications
due to limited propagation data availability. We present SIDSL (Structure-prior Informed Diffusion model for Source Localization), a generative diffusion framework that leverages topology-aware priors
to enable robust source localization with limited data. SIDSL addresses three key
challenges: unknown propagation patterns through structure-based source estimations
via graph label propagation, complex topology-propagation relationships via a propagation-enhanced
conditional denoiser with GNN-parameterized label propagation module, and class imbalance
through structure-prior biased diffusion initialization. By learning pattern-invariant
features from synthetic data generated by established propagation models, SIDSL enables
effective knowledge transfer to real-world scenarios. Experimental evaluation on four
real-world datasets demonstrates superior performance with 7.5-13.3% F1 score improvements
over baselines, including over 19% improvement in few-shot and 40% in zero-shot settings,
validating the framework's effectiveness for practical source localization. Our code
can be found here (https://github.com/tsinghua-fib-lab/SIDSL).
HyperGenFL: Hypernetwork-Generated Model Aggregation in Federated Learning
- Jerry Chen
- Qikai Lu
- Ruiqing Tian
- Di Niu
- Baochun Li
Federated learning is a decentralized framework that enables client participation
in collaborative learning without centralized data collection. However, the framework
is susceptible to suboptimal model convergence induced by heterogeneity among the
client datasets. These discrepancies, including label imbalance, dissimilarity in
data distributions, and uneven data volumes between clients, may cause disagreements
among local client updates, affecting the ability of the global model to converge
effectively during aggregation. We suggest that one potential solution to this problem
lies in weighting the model aggregation by client importance and client-to-client
relationships. Based on this idea, we propose HyperGenFL (HG-FL), a hypernetwork that generates aggregation weights from learnable client embeddings
without requiring any training or benchmarking data. HG-FL utilizes the attention mechanism to capture inter-client relationships based on learnable
client-specific embeddings in order to generate model aggregation weights dynamically
during federated learning. By guiding the aggregation process with these learnable
relationships between local models, HG-FL reduces update conflicts and improves global model performance. We assess HG-FL under various data-heterogeneous environments based on different benchmark datasets
including Fashion-MNIST, CIFAR10, CIFAR100 and Tiny-ImageNet. Experimental results
demonstrate that HG-FL can achieve superior performance over a range of existing baseline methods under
challenging cases with various heterogeneous environments, large models and a large
number of clients.
ActiViz: Understanding Sample Selection in Active Learning through Boundary Visualization
- Jie Chen
- Honghui Du
- Dairui Liu
- Siteng Ma
- Brian Mac Namee
- Ruihai Dong
The performance of Active Learning (AL) methods varies widely, influenced by the query
strategy, model, and dataset, with the reasons for variation in performance still
unclear and insufficiently studied. However, commonly used metrics like accuracy,
precision, and recall provide only limited analytical perspectives. No research has
effectively uncovered or explained the reasons behind these performance variations,
leaving a gap in understanding of the factors that influence the success or failure
of AL methods. To address this issue, we propose a novel method and tool leveraging
Voronoi Diagrams to visualize AL processes by illustrating interactions between classification
decision boundary changes and queried samples across AL iterations. We perform experiments
on synthetic and real-world datasets to validate the effectiveness of our method and
analyze various AL query strategies. By visualizing the AL process, we illustrate
how different query strategies progressively select samples and influence performance
in each iteration. This reveals the potential benefits of adapting query strategies
at different learning stages to improve active learning efficiency.
CCAgent: Coordinating Collaborative Data Scaling for Operating System Agents via Web3
- Liang Chen
- Haozhe Zhao
- Yinzhen Huang
- Yang Luo
- Tsekai Lin
- Weichu Xie
- Ruoyu Wu
- Peiyi Wang
- Runxin Xu
- Ming Wu
- Baobao Chang
The current AI revolution, fueled by Large Language Models (LLMs), heavily relies
on vast open-access internet data. However, the Operating System (OS) Agent field
faces a significant data sparsity challenge due to the lack of public data collection
systems and privacy concerns. To address this, we introduce CCAgent Net, a system
that coordinates and incentivizes internet users to contribute to scaling OS agent
datasets. Furthermore, we propose GUI-Pipe, an automated data post-processing pipeline
that evaluates, filters, and transforms raw user-uploaded OS data into trainable image-instruction-answer
data. This process results in CCAgent-Instruct, the largest instruction-based multi-platform
GUI agent dataset. Our experiments demonstrate CCAgent's effectiveness in advancing
OS Agent development. For instance, our CCAgent-GUI-3B model achieves a score of 33.7
(+127%) on the challenging out-of-domain Screenspots-Pro benchmark, significantly
outperforming other advanced open-source models like UI-TARS-2B, Qwen2.5-VL-3B/7B,
and UGround-V1-7B, even those of larger sizes. Our experiments also reveal the scaling
behaviors of GUI Agent training and insights for future direction.
M-LLM3REC: A Motivation-Aware User-Item Interaction Framework for Enhancing Recommendation
Accuracy with LLMs
- Lining Chen
- Qingwen Zeng
- Huaming Chen
Recommendation systems have been essential for both user experience and platform efficiency
by alleviating information overload and supporting decision-making. Traditional methods,
i.e., content-based filtering, collaborative filtering, and deep learning, have achieved
impressive results in recommendation systems. However, the cold-start and sparse-data
scenarios are still challenging to deal with. Existing solutions either generate pseudo-interaction
sequence, which often introduces redundant or noisy signals, or rely heavily on semantic
similarity, overlooking dynamic shifts in user motivation. To address these limitations,
this paper proposes a novel recommendation framework, termed M-LLM3REC, which leverages large language models for deep motivational signal extraction
from limited user interactions. M-LLM3REC comprises three integrated modules: the Motivation-Oriented Profile Extractor
(MOPE), Motivation-Oriented Trait Encoder (MOTE), and Motivational Alignment Recommender
(MAR). By emphasizing motivation-driven semantic modeling, M-LLM3REC demonstrates robust, personalized, and generalizable recommendations, particularly
boosting performance in cold-start situations in comparison with the state-of-the-art
frameworks.
Efficient Mask Learning for Language Model Fine-Tuning
- Minping Chen
- Ruijia Yang
- Zeyi Wen
Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has shown
promising results by updating significantly fewer parameters than full fine-tuning.
Masking-based fine-tuning is one type of PEFT methods, by freezing the majority of
the model parameters during fine-tuning. Existing masking-based fine-tuning methods
either need to manually select the trainable parameters (heuristic-based), or perform
mask learning to adaptively select the trainable parameters with high memory and computation
cost. To address these problems, this paper proposes Low-Rank based Efficient Mask
Learning (LoReML). LoReML performs mask learning based on low-rank decomposition and
matrix reconstruction with a small ratio of new parameters. After mask learning, LoReML
uses the scaled intermediate results in mask learning as warm start initialization
to boost the model quality, then freezes the masked parameters accordingly, and fine-tunes
the PLM. Moreover, LoReML exploits sparse training techniques to enhance the memory
efficiency in masking-based fine-tuning. Experimental results across various tasks
and pre-trained backbones demonstrate that LoReML can notably outperform existing
heuristic-based methods. Moreover, LoReML achieves competitive or better performance
compared with the adaptive mask learning methods, while improving memory and computation
efficiency by over 50% in mask learning.
Adaptive Heterogeneous Graph Neural Networks: Bridging Heterophily and Heterogeneity
Heterogeneous graphs (HGs) are common in real-world scenarios and often exhibit heterophily.
However, most existing studies focus on either heterogeneity or heterophily in isolation,
overlooking the prevalence of heterophilic HGs in practical applications. Such ignorance
leads to their performance degradation. In this work, we first identify two main challenges
in modeling heterophily HGs: (1) varying heterophily distributions across hops and
meta-paths; (2) the intricate and often heterophily-driven diversity of semantic information
across different meta-paths. Then, we propose the Adaptive Heterogeneous Graph Neural
Network (AHGNN) to tackle these challenges. AHGNN employs a heterophily-aware convolution
that accounts for heterophily distributions specific to both hops and meta-paths.
It then integrates messages from diverse semantic spaces using a coarse-to-fine attention
mechanism, which filters out noise and emphasizes informative signals. Experiments
on seven real-world graphs and twenty baselines demonstrate the superior performance
of AHGNN, particularly in high-heterophily situations.
Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement
Learning-based Recommendation
- Xiaocong Chen
- Siyu Wang
- Lina Yao
Reinforcement learning-based recommender systems (RL4RS) have gained attention for
their ability to adapt to dynamic user preferences. However, these systems face challenges,
particularly in offline settings, where data inefficiency and reliance on pre-collected
trajectories limit their broader applicability. While offline reinforcement learning
methods leverage extensive datasets to address these issues, they often struggle with
noisy data and fail to capture long-term user preferences, resulting in suboptimal
recommendation policies. To overcome these limitations, we propose Diffusion-enhanced
Actor-Critic for Offline RL4RS (DAC4Rec), a novel framework that integrates diffusion
processes with reinforcement learning to model complex user preferences more effectively.
DAC4Rec leverages the denoising capabilities of diffusion models to enhance the robustness
of offline RL algorithms and incorporates a Q-value-guided policy optimization strategy
to better handle suboptimal trajectories. Additionally, we introduce an energy-based
sampling strategy to reduce randomness during recommendation generation, ensuring
more targeted and reliable outcomes. We validate the effectiveness of DAC4Rec through
extensive experiments on six real-world offline datasets and in an online simulation
environment, demonstrating its ability to optimize long-term user preferences. Furthermore,
we show that the proposed diffusion policy can be seamlessly integrated into other
commonly used RL algorithms in RL4RS, highlighting its versatility and wide applicability.
Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model
Prior
- Xiaocong Chen
- Siyu Wang
- Lina Yao
Reinforcement Learning-based recommender systems (RLRS) offer an effective way to
handle sequential recommendation tasks but often face difficulties in real-world settings,
where user feedback data can be sub-optimal or sparse. In this paper, we introduce
MDT4Rec, an offline RLRS framework that builds on the Decision Transformer (DT) to
address two major challenges: learning from sub-optimal histories and representing
complex user-item interactions. First, MDT4Rec shifts the trajectory stitching procedure
from the training phase to action inference, allowing the system to shorten its historical
context when necessary and thereby ignore negative or unsuccessful past experiences.
Second, MDT4Rec initializes DT with a pre-trained large language model (LLM) for knowledge
transfer, replaces linear embedding layers with Multi-Layer Perceptrons (MLPs) for
more flexible representations, and employs Low-Rank Adaptation (LoRA) to efficiently
fine-tune only a small subset of parameters. We evaluate MDT4Rec on five public datasets
and in an online simulation environment, demonstrating that it outperforms existing
methods.
Target Item-oriented Conditional Diffusion Differential Transformer for Next-Item
Prediction
- Xiaoqing Chen
- Zitao Xu
- Weike Pan
- Zhong Ming
Sequential recommendation (SR) aims to capture users' dynamic preferences based on
their historical interactions and provide personalized next-item prediction. Multi-behavior
SR (MBSR) further considers behavior types of user-item interactions, which can reveal
diverse user interests and alleviate the data sparsity issue w.r.t. the target purchase
behaviors. Most existing MBSR approaches ignore the importance of target items closely
related to user interests. Moreover, they often suffer from the problem of limited
vector representation capability. To tackle the above two challenges, we propose a
novel solution called target item-oriented conditional diffusion differential Transformer
(ICDDT). Specifically, our ICDDT introduces distribution representations via the diffusion
model, allowing effective utilization of target item information during training to
better capture user preferences. Firstly, our ICDDT achieves a more appropriate behavior-aware
step selection in the diffusion phase by distinguishing the sampling distributions
of diffusion steps w.r.t. behavior types. Secondly, our ICDDT introduces three conditions
of interaction sequences, target behaviors and diffusion steps into the reverse phase
to guide the training of the differential Transformer-based approximator, generating
denoised target item representations as user personalized interests. Finally, our
ICDDT sets an inference step truncation factor to fit the diffusion step sampling
distributions and accelerate the inference process. We conduct extensive experiments
on two real-world datasets, where the results show that our ICDDT significantly outperforms
all baselines on all metrics. The datasets, source codes and scripts are available
at https://github.com/Erin-Gr/ICDDT.
ConsensNet: A Unified Consensus-Centric Framework for Incomplete Multi-View Clustering
- Yifei Chen
- Xiaolin Xiao
- Yue-Jiao Gong
Incomplete Multi-View Clustering (IMVC) addresses the challenge of missing data by
leveraging available information and effectively mining cross-view relationships.
While contrastive learning has recently been introduced into IMVC for discriminative
representation learning, existing methods typically adopt pairwise contrastive strategies
with view-specific reconstruction and heuristic fusion schemes. These approaches are
generally suboptimal when facing high missing-view ratios and struggle to capture
latent cross-view dependencies. To overcome these limitations, we propose ConsensNet,
a unified consensus-centric framework for IMVC. This is achieved through a unified
architecture that integrates contrastive cross-view alignment, consensus prediction,
and attention-aware fusion. By aligning all available views into a shared semantic
space, ConsensNet effectively captures latent cross-view dependencies without requiring
high-quality view completion. Moreover, the attention-aware fusion mechanism dynamically
assigns weights to each view based on its relevance to the consensus, thereby reducing
the impact of noisy or weakly correlated views. Extensive experiments on multiple
datasets demonstrate that ConsensNet consistently outperforms state-of-the-art IMVC
methods, particularly under high missing-view scenarios, highlighting its robustness
and practical significance.
NR-GCF: Graph Collaborative Filtering with Improved Noise Resistance
- Yijun Chen
- Bohan Li
- Yicong Li
- Lixiang Song
- Haofen Wang
- Wenlong Wu
- Junnan Zhuo
- Hongzhi Yin
Graph Neural Networks (GNNs) have emerged as the preferred backbone model of recommender
systems, credited to their strong capability in capturing the intricate topological
relationships within user-item interactions. Nevertheless, a common oversight in existing
studies is the presumption of the inherent reliability of these interactions, ignoring
the reality that a significant fraction of user-item engagements, such as accidental
clicks, are inherently noisy. Extensive studies have revealed that GNN is vulnerable
to such noisy edges within the graph-structured data, as those noisy edges can mislead
the network into overfitting incorrect patterns of interactions, thereby propagating
such incorrect information through the entire interaction network.To address these
challenges, in this paper, we propose a novel noise-robust GNN-based training strategy
for recommendation, known as Noise-Resistant Graph Collaborative Filtering (NR-GCF).
NR-GCF innovatively adopts a two-stage learning paradigm to filter out unreliable
interactions, leveraging the memorization effect of GNNs. It further utilizes representation modulation to learn noise-resistant embeddings,
enhancing robustness for recommendation tasks.Comprehensive experiments and ablation
studies demonstrate the effectiveness and robustness of the proposed NR-GCF. Our implementation
has been made available at:https://github.com/1197151063/NRGCF.git
High-Context Empathy in Conversations for Large Language Models
- Yuyan Chen
- Lei Xia
- Jinghan Cao
- Zhendong Hou
- Weinan Dai
- Zhixu Li
Large Language Models (LLMs) exhibit remarkable capabilities across various downstream
tasks, including empathetic dialogues. However, a non-trivial question arises: Do
they possess high-context empathy and can they generate emotional interactions with
humans? High-context empathy, which tends to be more indirect and concise like Chinese-style
empathy, differs from the current empathy capabilities of LLMs. These capabilities
are predominantly low-context empathy, which is often direct and lengthy, resembling
English-style empathy. In this paper, We first construct a comprehensive Chinese High-context Empathy Dialogue dataset (HED), which consists of emotional, role-based emotional, personality-based
emotional, and role-personality-based emotional dialogues. Next, we explore whether
LLMs have high-context empathy in conversations. After that, we propose an innovative
High-context Empathy Network (HEN) to improve LLMs' capabilities in generating high-context empathetic responses.
Our empirical study demonstrates that there is much room for LLMs in generating high-context
empathetic responses, and the proposed HEN can not only significantly improve LLMs'
capabilities in generating high-context empathetic responses, but also has positive
effects for LLMs in solving similar sentiment-related tasks.
Rethinking Client-oriented Federated Graph Learning
- Zekai Chen
- Xunkai Li
- Yinlin Zhu
- Rong-Hua Li
- Guoren Wang
As a new distributed graph learning paradigm, Federated Graph Learning (FGL) facilitates
collaborative model training across local systems while preserving data privacy. We
review existing FGL approaches and categorize their optimization mechanisms into:
(1) Server-Client (S-C), where clients upload local model parameters for server-side
aggregation and global updates; (2) Client-Client (C-C), which allows direct exchange
of information between clients and customizing their local training process. We reveal
that C-C shows superior potential due to its refined communication structure. However,
existing C-C methods broadcast redundant node representations, incurring high communication
costs and privacy risks at the node level. To this end, we propose FedC4, which combines
graph Condensation with C-C Collaboration optimization. Specifically, FedC4 employs
graph condensation technique to refine the knowledge of each client's graph into a
few synthetic embeddings instead of transmitting node-level knowledge. Moreover, FedC4
introduces three novel modules that allow the source client to send distinct node
representations tailored to the target client's graph properties. Experiments on eight
public real-world datasets show that FedC4 outperforms state-of-the-art baselines
in both task performance and communication cost. Our code is now available on https://github.com/Ereshkigal1/FedC4.
Hypercomplex Prompt-aware Multimodal Recommendation
- Zheyu Chen
- Jinfeng Xu
- Hewei Wang
- Shuo Yang
- Zitong Wan
- Haibo Hu
Modern recommender systems face critical challenges in handling information overload
while addressing the inherent limitations of multimodal representation learning. Existing
methods suffer from three fundamental limitations: (1) restricted ability to represent
rich multimodal features through a single representation, (2) existing linear modality
fusion strategies ignore the deep nonlinear correlations between modalities, and (3)
static optimization methods failing to dynamically mitigate the over-smoothing problem
in graph convolutional network (GCN). To overcome these limitations, we propose HPMRec, a novel Hypercomplex Prompt-aware Multimodal Recommendation framework, which utilizes hypercomplex embeddings in the form of multi-components
to enhance the representation diversity of multimodal features. HPMRec adopts the
hypercomplex multiplication to naturally establish nonlinear cross-modality interactions
to bridge semantic gaps, which is beneficial to explore the cross-modality features.
HPMRec also introduces the prompt-aware compensation mechanism to aid the misalignment
between components and modality-specific features loss, and this mechanism fundamentally
alleviates the over-smoothing problem. It further designs self-supervised learning
tasks that enhance representation diversity and align different modalities. Extensive
experiments on four public datasets show that HPMRec achieves state-of-the-art recommendation
performance.
FROG: Fair Removal on Graph
- Ziheng Chen
- Jiali Cheng
- Hadi Amiri
- Kaushiki Nag
- Lu Lin
- Sijia Liu
- Gabriele Tolomei
- Xiangguo Sun
With growing emphasis on privacy regulations, machine unlearning has become increasingly critical in real-world applications such as social networks
and recommender systems, many of which are naturally represented as graphs. However,
existing graph unlearning methods often modify nodes or edges indiscriminately, overlooking
their impact on fairness. For instance, forgetting links between users of different
genders may inadvertently exacerbate group disparities. To address this issue, we
propose a novel framework that jointly optimizes both the graph structure and the
model to achieve fair unlearning. Our method rewires the graph by removing redundant edges that hinder
forgetting while preserving fairness through targeted edge augmentation. We further
introduce a worst-case evaluation mechanism to assess robustness under challenging
scenarios. Experiments on real-world datasets show that our approach achieves more
effective and fair unlearning than existing baselines.
Advancing Temporal Sensitive Question Answering through Progressive Multi-Step Reflection
- Ziyang Chen
- Erxue Min
- Xiang Zhao
- Yunxin Li
- Xin Jia
- Jinzhi Liao
- Shuaiqiang Wang
- Baotian Hu
- Dawei Yin
Retrieval-augmented generation (RAG) has demonstrated strong potential in enhancing
large language models (LLMs) for complex, real-world question answering. However,
existing RAG frameworks remain inadequate for temporal scenarios, primarily due to
their inability to jointly model temporal constraints in both retrieval and reasoning.
On the retrieval side, traditional approaches focus on semantic similarity, often
returning outdated or temporally misaligned evidence. On the generation side, these
systems frequently produce factually incorrect or hallucinated answers when confronted
with incomplete or temporally inconsistent information. Motivated by the observed
limitations, we propose ChronoReflect+, a temporal logic-aware RAG framework that
incorporates hybrid temporal-aware retrieval and progressive multi-step reflection.
Our method iteratively refines both retrieval and reasoning, identifying and bridging
information gaps as context accumulates. Extensive experiments demonstrate that ChronoReflect+
significantly outperforms state-of-the-art RAG baselines-improving end-to-end accuracy
by 15.2%-particularly on questions involving implicit time expressions and multi-hop
reasoning.
Evolving Graph-Based Context Modeling for Multi-Turn Conversational Retrieval-Augmented
Generation
- Yiruo Cheng
- Hongjin Qian
- Fengran Mo
- Yongkang Wu
- Zhonghua Li
- Qi Ye
- Ji-Rong Wen
- Zhicheng Dou
Conversational Retrieval-Augmented Generation (RAG) systems enhance user interactions
by integrating large language models (LLMs) with external knowledge retrieval. However,
multi-turn conversations present significant challenges, including implicit user intent
and noisy context, which hinder accurate retrieval and response generation. Existing
approaches often struggle with the unstructured conversational context and fail to
model explicit relations among conversational turns. Moreover, they do not leverage
historically relevant passages effectively. To overcome these limitations, we propose
EvoRAG, a novel framework that maintains an evolving knowledge graph aligned with
the unstructured conversational context. This graph explicitly captures relations
among user queries, system responses, and relevant passages across conversational
turns, serving as a structured representation of the context. EvoRAG includes three
key components: (1) a dual-path retrieval module for context denoising, (2) a unified
knowledge integration module for query rewriting and summarization, and (3) a graph-enhanced
RAG module for accurate retrieval and response generation. Experiments on four public
conversational RAG datasets show that EvoRAG significantly outperforms strong baselines,
particularly in handling topic shifts and long dialogue contexts.
PP-STAT: An Efficient Privacy-Preserving Statistical Analysis Framework using Homomorphic
Encryption
With the widespread adoption of cloud computing, the need for outsourcing statistical
analysis to third-party platforms is growing rapidly. However, handling sensitive
data such as medical records and financial information in cloud environments raises
serious privacy concerns. In this paper, we present PP-STAT, a novel and efficient Homomorphic Encryption (HE)-based framework for privacy-preserving
statistical analysis. HE enables computations to be performed directly on encrypted
data without revealing the underlying plaintext. PP-STAT supports advanced statistical measures, including Z-score normalization, skewness,
kurtosis, coefficient of variation, and Pearson correlation coefficient, all computed
securely over encrypted data. To improve efficiency, PP-STAT introduces two key optimizations: (1) a Chebyshev-based approximation strategy for
initializing inverse square root operations, and (2) a pre-normalization scaling technique
that reduces multiplicative depth by folding constant scaling factors into mean and
variance computations. These techniques significantly lower computational overhead
and minimize the number of expensive bootstrapping procedures. Our evaluation on real-world
datasets demonstrates that PP-STAT achieves high numerical accuracy, with mean relative error (MRE) below 2.4×10-4.
Notably, the encrypted Pearson correlation coefficient between the smoker attribute
and charges reaches 0.7873, with an MRE of 2.86×10-4. These results confirm the practical utility of PP-STAT for secure and precise statistical analysis in privacy-sensitive domains.
DYCOR: Capturing Hidden Stock Relationships for Stock Trend Prediction
- Kangmin Choi
- Geon Shin
- Jungwoo Yang
- Hyunjoon Kim
Stock trend prediction, the task of forecasting future trends of stocks from their
historical feature sequences, remains highly challenging due to the complex and dynamic
nature of financial markets. In reality, stocks form diverse relationships that transcend
traditional sector boundaries as market conditions evolve, i.e., stocks within the
same sector may display different trends, while those in different sectors often exhibit
similar movements. However, most existing stock prediction methods rely on predefined
static relationships, lacking flexibility to adapt to changing market dynamics. Furthermore,
objectives widely adopted in prior work have limitations in capturing complex patterns
and relationships in stock market data. To address these limitations, we propose DYCOR, a novel stock trend prediction method that integrates two key innovations: (i) dynamic
stock clustering, which captures market characteristics without relying on predefined
relationship data by adaptively discovering hidden stock relationships; and (ii) correlation-aware
training, which aligns predicted and ground-truth stock trends by reflecting their
correlations in a fine-grained manner. We evaluate DYCOR on three datasets NASDAQ,
NYSE, and S&P 500 widely used in existing research, and this method demonstrates superior
performance across correlation-based and retrieval-based metrics compared to state-of-the-art
baseline methods, while maintaining competitive runtime efficiency.
Learnable Orthogonal Decomposition for Non-Regressive Prediction for PDE
- Yun Young Choi
- Kyujin Han
- Joohwan Ko
- Sangwook Baek
- Seunghwan Lee
Modeling the spatio-temporal evolution of complex physical sys- tems remains a fundamental
challenge in both deep learning and sci- entific computing. While recent methods such
as Transformers and Neural Operators have shown promise in learning PDE solutions,
their reliance on auto-regressive forecasting often increases compu- tational overhead
and accumulates prediction errors over time. In this paper, we propose Learnable Orthogonal
Decomposition (LOD), a non-regressive framework that integrates ideas from classical
Proper Orthogonal Decomposition (POD) with modern deep learn- ing. LOD first performs
parameter-wise POD: at each time step, we apply POD to an ensemble of PDE solutions
generated under differ- ent physical parameters, yielding a time-indexed set of orthonormal
spatial bases. These bases initialize a learnable dictionary and are refined during
end-to-end training. Given only a short prefix of ini- tial conditions for a target
parameter setting, a neural encoder pre- dicts-in a single shot-the entire trajectory
of parameter-wise POD coefficients. The final solution field is reconstructed by combining
the predicted coefficients with the learned bases, avoiding error ac- cumulation inherent
to auto-regressive strategies. Comprehensive experiments on various PDE benchmark
datasets demonstrate that LOD achieves state-of-the-art accuracy while significantly
reducing computational costs.
BrainX: A Universal Brain Decoding Framework with Feature Disentanglement and Neuro-Geometric
Representation Learning
- Zheng Cui
- Dong Nie
- Pengcheng Xue
- Xia Wu
- Daoqiang Zhang
- Xuyun Wen
Decoding visual stimuli from human brain activity is a fundamental challenge in cognitive
neuroscience and neuroimaging. While recent advances in deep learning have significantly
improved the performance of fMRI-to-image decoding, most existing methods overlook
the issue of inter-subject variability in fMRI data, which leads to poor generalization
across subjects. Current approaches often rely on partially shared model architectures
that offer limited generalization and still require subject-specific components, restricting
their applicability to unseen subjects. To address this limitation, we propose BrainX,
a universal brain decoding framework that constructs a unified fMRI encoder and image
generator to achieve subject-agnostic modeling. Specifically, we introduce a feature
disentanglement mechanism that extracts subject-shared features from the fMRI embeddings,
which are then fed into the image generator to reconstruct visual stimuli. This design
eliminates the need for subject-specific models and significantly enhances cross-subject
generalization. Additionally, we develop a neuro-geometric fMRI representation learning
method that projects 3D cortical structures onto a 2D surface space, effectively mitigating
the inaccuracies caused by imprecise geodesic distance estimation in 3D Euclidean
space. Extensive experiments on the Natural Scenes Dataset (NSD) demonstrate that
BrainX consistently outperforms existing state-of-the-art methods across three decoding
settings: within-subject, cross-subject with finetuning, and cross-subject without
finetuning.
FedGVD: Efficient Federated Graph Learning via Unidirectional Distillation with Dynamic
Virtual Nodes
- Zhehao Dai
- Guojiang Shen
- Yuyue Hu
- Jiaxin Du
- Xiao Han
- Xiangjie Kong
Federated Graph Learning (FGL) has emerged as a key paradigm for distributed graph
machine learning, enabling cross-domain graph collaborative modeling while preserving
data privacy. However, existing methods face two major bottlenecks: the structural
heterogeneity discrepancy of graph data among clients weakens the generalization ability
of the global model; and model heterogeneity leads to inefficient knowledge sharing
and complex global aggregation. To address these issues, we propose FedGVD, an efficient
framework that constructs a global perspective through data condensation and server-side
virtual node generation, which not only preserves the semantic equivalence of the
original data but also avoids privacy leakage. Subsequently, by distributing low-dimensional
generalizable knowledge for unidirectional distillation, FedGVD enables local models
to absorb global knowledge without transmitting local parameters, thus breaking through
the challenges of data and structural heterogeneity as well as model heterogeneity.
This innovative approach ensures privacy-preserving and efficient federated graph
collaboration. Experiments show that FedGVD maintains excellent performance in heterogeneous
model scenarios while significantly improving communication efficiency, offering a
new approach for privacy-preserving collaborative modeling in FGL. The code is available
at https://github.com/Jasonxx4/FedGVD.
TFMAdapter: Lightweight Instance-Level Adaptation of Foundation Models for Forecasting
with Covariates
- Afrin Dange
- Sunita Sarawagi
Time Series Foundation Models (TSFMs) have recently achieved state-of-the-art performance
in univariate forecasting on new time series simply by conditioning on a brief history
of past values. Their success demonstrates that large-scale pretraining across diverse
domains can acquire the inductive bias to generalize from temporal patterns in a brief
history. However, most TSFMs are unable to leverage covariates-future-available exogenous
variables critical for accurate forecasting in many applications-due to their domain-specific
nature and the lack of associated inductive bias.
We propose TFMAdapter, a lightweight, instance-level adapter that augments TSFMs with
covariate information without fine-tuning. Instead of retraining, TFMAdapter operates
on the limited history provided during a single model call, learning a non-parametric
cascade that combines covariates with univariate TSFM forecasts. However, such learning
would require univariate forecasts at all steps in the history, requiring too many
calls to the TSFM. To enable training on the full historical context while limiting
TSFM invocations, TFMAdapter uses a two-stage method: (1) generating pseudo-forecasts
with a simple regression model, and (2) training a Gaussian Process regressor to refine
predictions using both pseudo- and TSFM forecasts alongside covariates.
Extensive experiments on real-world datasets demonstrate that TFMAdapter consistently
outperforms both foundation models and supervised baselines, achieving a 24-27% improvement
over base foundation models with minimal data and computational overhead. Our results
highlight the potential of lightweight adapters to bridge the gap between generic
foundation models and domain-specific forecasting needs.
ExplorAct: Context-Aware Next Action Recommendations for Interactive Data Exploration
- Dinuka Manohara de Zoysa
- James Bailey
- Renata Borovica-Gajic
Modern data analysis platforms, such as Tableau, Microsoft Power BI, Google Looker
Studio, Kibana, and Splunk, have democratized data exploration by enabling users to
interact with data through intuitive visual interfaces, eliminating the need for proficiency
in query languages like SQL. These platforms allow both experts and non-experts to
perform high-level operations and incrementally construct complex analysis workflows.
As the volume and complexity of data grow, assisting users in navigating these workflows
becomes increasingly important. One promising direction is to provide intelligent
next-action recommendations that guide users through meaningful and efficient exploration
paths.
In this paper, we present ExplorAct, a context-aware next-action recommendation framework
that leverages historical session logs to predict and suggest relevant next steps
during data exploration. Unlike existing approaches that suffer from scalability issues
due to log-size-dependent retrieval, ExplorAct achieves constant-time inference by
employing a deep learning architecture that models both the structural and sequential
aspects of exploration sessions. Through extensive experiments on four real-world
datasets, we show that ExplorAct consistently outperforms state-of-the-art (SOTA)
baselines across three core recommendation tasks, while maintaining stable and low-latency
inference regardless of log size.
Correlation-aware Online Change Point Detection
- Chengyuan Deng
- Zhengzhang Chen
- Xujiang Zhao
- Haoyu Wang
- Junxiang Wang
- Jie Gao
- Haifeng Chen
Change point detection aims to identify abrupt shifts occurring at multiple points
within a data sequence. This task becomes particularly challenging in the online setting,
where different types of change can occur, including shifts in both the marginal and
joint distributions of the data. In this paper, we address these challenges by tracking
the Riemannian geometry of correlation matrices, allowing Riemannian metrics to compute
the geodesic distance as an accurate measure of correlation dynamics.
We introduce Rio-CPD, a correlation-aware online change point detection framework that integrates
the Riemannian geometry of the manifold of symmetric positive definite matrices with
the cumulative sum (CUSUM) statistic for detecting change points. Rio-CPD employs a novel CUSUM design by computing the geodesic distance between current
observations and the Fréchet mean of prior observations. With appropriate choices
of Riemannian metrics, Rio-CPD offers a simple yet effective and computationally efficient algorithm. We also
provide a theoretical analysis on standard metrics for change point detection within
Rio-CPD. Experimental results on both synthetic and real-world datasets demonstrate
that Rio-CPD outperforms existing methods on detection accuracy, average detection
delay, and efficiency.
Mitigating Latent Confounding Bias in Recommender Systems
- Jianfeng Deng
- Qingfeng Chen
- Debo Cheng
- Xiaojing Du
- Jiuyong Li
- Lin Liu
Recommender systems are crucial for providing personalised experiences, but their
effectiveness is often undermined by confounding bias, particularly in the presence
of latent confounders. Existing debiasing methods typically address only one type
of latent confounding bias, often ignoring the complex interactions caused by latent
confounders, such as those between items and user feedback, and between item exposure
and user feedback. To tackle these challenges, we propose a novel Deep Instrumental
Variables (IV) approach for debiased representation learning in Recommendation Systems,
referred to as DIVERS. Specifically, DIVERS leverages user feature embeddings as IVs
to mitigate the confounding bias between items and user feedback caused by latent
confounders, and combines the debiased item embeddings with an item exposure vector
to generate a reconstructed item exposure vector. Moreover, DIVERS employs an identifiable
Variational Auto-Encoder (iVAE) to infer identifiable representations by utilising
information from both the original and reconstructed item exposure vectors, effectively
addressing the confounding bias introduced by latent confounders between item exposure
and user feedback. Additionally, we provide theoretical analyses to demonstrate the
soundness of using IV and the identifiability of the representation learned by DIVERS.
Extensive experiments on both synthetic and real-world datasets confirm that DIVERS
outperforms state-of-the-art models in reducing bias and providing reliable recommendations.
Our source code is available at: https://github.com/djf-web/DIVERS.
DDE-CLIP: Detail-Guided Dual-Modal Enhancement for Zero-Shot Anomaly Detection
- Zehao Deng
- Qingzhi Ma
- An Liu
Zero-shot Anomaly Detection (ZSAD) is an emerging task in industrial settings. It
aims to detect anomalies in a target dataset without training samples, which is crucial
for sample scarcity and data privacy. Existing methods largely rely on CLIP, leveraging
its internal knowledge to detect anomalies. However, due to its pre-training on natural
image-text pairs, CLIP suffers from domain shift, favoring global semantics over fine-grained
defect detection in industrial images. Furthermore, most existing methods employ fixed
text prompt to guide the model, which is difficult to describe diverse and unseen
anomalies, leading to poor accuracy. To address these limitations, we propose a Detail-guided
Dual-modal Enhancement Model (DDE-CLIP) for the ZSAD task. Firstly, we designed the
Detail Feature Reinforcement Module (DFRM) to capture local representations of minute
defects. Its specialized design effectively enhances the model's perception of fine-grained
anomalies and enables the pre-trained CLIP model to better adapt to the unique visual
characteristics of industrial images. Subsequently, we introduced the Visual-guided
Text Refinement Module (VTRM), which can dynamically optimize text prompts based on
the input image's visual content (particularly the detail features captured by DFRM).
This ensures the accurate reflection of text prompts on specific semantics of various
defects, thereby significantly enhancing the alignment between vision and text for
unseen anomalies. Overall, our DDE-CLIP uses detail features to enhance both image
and text modalities, effectively addressing the challenges of ZSAD. Extensive experiments
on 7 real-world industrial product datasets demonstrate that DDE-CLIP exhibits superior
detection and localization capabilities compared to other methods. The code is available
at https://github.com/zhushengxinyue/DDE-CLIP.
Urban In-context Learning: A New Paradigm for Urban Indicator Prediction
- Zerong Deng
- Liangzhe Han
- Tongyu Zhu
- Ziqi Miao
- Yi Xu
- Leilei Sun
Recent years have witnessed the rapid development of urbanization. Specifically, urban
indicator prediction has become an important tool for urban planning and decision-making
and promoting the process of urbanization. However, the existing methods have the
following two drawbacks. First, they follow the ''pre-training and fine-tuning'' paradigm,
which is time-consuming and resource-intensive. Second, to encode urban knowledge
for downstream tasks effectively, complex pre-training tasks must be designed to train
the model in a task-agnostic manner while ensuring generalization. In this work, we
propose UrbanICL, an urban in-context learning framework as a new paradigm for urban
indicator prediction. Compared to directly predicting urban indicators, we obtain
predictions for new regions by aggregating the downstream labels of similar regions.
Specifically, a retrieval-based urban in-context learning module is proposed to retrieve
regions with similar urban semantics and aggregate their corresponding labels to make
predictions for new regions. We also design a region-dependent distribution learning
module to learn the new distribution of unknown regions and facilitate the adaptation
of UrbanICL for distributional shifts and outliers. Our framework, with in-context
learning, brings a new insight for urban indicator prediction. We conduct extensive
experiments on real-world datasets collected from three cities. The experiment results
demonstrate the effectiveness of UrbanICL, even in an extremely low-consumption and
time-efficient manner.
DIVAgent: A Diversified Search Agent that Mimics the Human Search Process
- Zhirui Deng
- Jingfen Qiao
- Zhicheng Dou
- Ji-Rong Wen
- Maarten de Rijke
Search result diversification plays a crucial role in addressing query ambiguity and
multi-faceted information needs by reducing redundancy across documents. While previous
supervised approaches can achieve superior performance, they require costly, large-scale
annotated data. In contrast, unsupervised methods are more flexible and training-free
but rely on manually designed ranking functions, often leading to suboptimal performance.
Inspired by how humans explore diverse information during real-world searching, we
propose a diversified search agent DIVAgent to combine the advantages of supervised
and unsupervised methods. DIVAgent introduces LLMs as the ''brain'' to reason over
complex and diverse search results and delineate human cognitive processes into a
workflow tailored for search result diversification. Our search agent first identifies
potential user intents and then analyzes the alignment of each document to the intents
via an intent-aware module. To guide the generation of diversified document rankings,
we design an intent-guided ranker that explicitly links documents to their dominant
intents while performing greedy document selection. Experimental results demonstrate
that DIVAgent significantly outperforms existing unsupervised baselines and achieves
competitive performance with supervised models, highlighting the promise of LLMs for
diversified ranking in realistic search scenarios.
Unsupervised Adversarial Contrastive Hashing for Cross-Modal Retrieval
- Guohui Ding
- Zhonghua Li
- Rui Zhou
- Qian Gao
Cross-modal hashing has gained widespread attention in cross-modal retrieval due to
its low storage cost and significant computational efficiency. Existing cross-modal
hashing methods primarily focus on learning modality invariance by mapping data from
different modalities into a shared space and learn unified hash codes. Nevertheless,
due to the inherent heterogeneity between different modalities, the common subspace
may still exhibit modality discrepancies. This ultimately makes it challenging to
achieve semantic alignment, thereby affecting the accuracy of cross-modal retrieval.
To address this issue, we propose an Unsupervised Adversarial Contrastive Hashing
(UACH) method for cross-modal retrieval. Specifically, we design a cycle generative
adversarial network to learn the transformation relationships between different modality
feature domains, effectively promoting semantic alignment across modalities. Additionally,
we employ dual contrastive learning to simultaneously measure the representation learning
and hashing learning components of each specific modality, and learn unified hash
codes for each specific modality, thus mitigating the impact of modality discrepancies.
Extensive experiments conducted on three cross-modal benchmark datasets demonstrate
that our model outperforms the state-of-the-art baselines.
EventPuzzle: A Benchmark for Multi-Perspective Event Prediction Based on Event Arguments
- Guoxuan Ding
- Junhao Zhou
- Yuqing Li
- Xiaobo Guo
- Xin Wang
- Daren Zha
Event prediction is a critical task in natural language processing, aimed at reasoning
and forecasting future events based on known event texts. This paper introduces EventPuzzle,
a benchmark designed to evaluate the event prediction capabilities of large language
models based on event arguments. By introducing argument points, we design tasks and
evaluation methods to assess models' ability to predict events from different argument
perspectives. EventPuzzle consists of both closed-ended and open-ended tasks. In the
closed-ended task, models select the correct argument point from causal chains, while
in the open-ended task, models generate event descriptions using two strategies: Argument-based
Generation and Direct Generation. We construct an argument point dataset and evaluate
multiple LLMs, demonstrating the models' performance across various tasks. Our experimental
analysis reveals the strengths and limitations of current models and suggests future
directions for improving event prediction.
Adaptive Bidirectional State Space Model for High-frequency Portfolio Management
- Wei Ding
- Hanpeng Jiang
- Ruibo Xiong
- Yongrong Wu
- Jingan Chen
- Lifan Chen
- Pengfei Ding
- Fan Lin
State space models (SSMs) have recently shown great potential on long-range sequence
modeling tasks. Benefiting from SSMs' low spatio-temporal overhead and powerful modeling
capabilities, utilizing them for high-frequency portfolio management is an appealing
research direction. However, representing financial data is challenging for SSMs due
to: 1) the non-stationary nature of financial markets and 2) the requirement of asset
correlations for financial understanding. In this paper, under a deep reinforcement
learning (DRL) paradigm for high-frequency portfolio management, we propose a novel
Adaptive Bidirectional State Space Model (ABSSM) to tackle the above challenges. Specifically,
in order to cope with changing market conditions, we design an adaptive linear time-varying
structure, which precisely captures domain shifts in temporal patterns through an
input-dependent state transition matrix, thereby seizing fleeting arbitrage opportunities.
Furthermore, we enhance this framework by constructing a bidirectional state space
layer, which extracts asset correlations by compressing the global context. To the
best of our knowledge, this is the first work that solves the high-frequency portfolio
management problem by devising a specialized state space model in the DRL framework.
Through extensive experiments on real-world data from the U.S., China, and cryptocurrency
markets, we show that our proposed ABSSM significantly outperforms state-of-the-art
benchmark methods in balancing profits and risks.
Exploring the Impact of Warnings on User Perception towards AI-Generated Content in
Search Results
- Pia Donabauer
- David Elsweiler
Generative-AI answer boxes have become a default element of modern search engines,
yet their responses are not always trustworthy. We study whether a simple disclosure
can temper the influence of these answers on users' beliefs and behaviour. In a between-subjects
online experiment (N=57) participants formed opinions on one of three controversial
topics while interacting with a SERP whose featured answer was produced by ChatGPT.
Each participant had a 50% chance of seeing a banner that (i) disclosed the answer's
AI origin, (ii) listed three key limitations, and (iii) linked to additional details.
The banner did not shift overall opinion means, but it changed how people reacted
to the AI: participants who saw the warning were 83% more likely to adopt a stance
that contradicted the chatbot's answer than those in the no-banner condition. Stance
alignment proved even more decisive: when the AI answer matched a user's initial view,
the likelihood of attitude change fell by 85% and exploration of opposing results
dropped significantly. Conversely, when the AI disagreed with users, 41% of their
post-task explanations converged semantically toward the chatbot's wording (vs. 14%
when they already agreed), revealing subtle linguistic uptake detected via BERT embeddings.
Together these findings show that (1) a lightweight disclosure can promote a more
critical stance toward AI output, yet (2) pre-existing agreement with the AI strongly
anchors users, suppressing both critical search behaviour and reflective revision.
We argue that AI-integrated search interfaces should pair transparency banners with
additional bias-mitigation strategies to support informed and balanced information
seeking.
Decoupling Feature Entanglement for Personalized Federated Learning via Neural Collapse
Heterogeneous data is a critical challenge in Personalized Federated Learning (pFL),
as it leads to feature entanglement, making it difficult to share knowledge across
clients. Addressing this issue requires a deep understanding of the underlying mechanisms
of feature distribution, as well as effective strategies for sharing knowledge among
participating clients. Motivated by the phenomenon of Neural Collapse (NC) observed
in well-trained deep classification models, we propose FedDemux, a novel pFL framework
that facilitates the personalization process by explicitly promoting the emergence
of NC. FedDemux tackles feature entanglement through the coordination of two key modules:
(1) a simplex learnable embedding (SLE) module guided by NC to learn and rectify local
features, and (2) a knowledge decoupling module (KDM) that extracts general knowledge
to align local features with the global simplex-learnable embeddings, while personalized
knowledge further enhances local inference capabilities. We conduct extensive experiments
on three real-world datasets with heterogeneous settings, where FedDemux consistently
outperforms state-of-the-art methods in all cases. Specifically, FedDemux achieves
up to a 13.54% accuracy improvement over baselines on the CIFAR-100 dataset. Scalability
and ablation experiments validate FedDemux's effectiveness and SLE's role in accelerating
convergence.
MUSE: A Multi-slice Joint Analysis Method for Spatial Transcriptomics Experiments
- Ziheng Duan
- Xi Li
- Zhiqing Xiao
- Rex Ying
- Jing Zhang
Recent advances in spatial transcriptomics (ST) and cost reductions have enabled large-scale
multi-slice ST data generation, enhancing the statistical power to detect subtle biological
signals. However, cross-slice inconsistencies and data quality variability present
significant analytical challenges. To overcome these limitations, we developed MUSE,
a computational framework designed for multi-slice joint embedding, spatial domain
identification, and gene expression imputation. Specifically, MUSE integrates a two-module
architecture to ensure robust cross-slice alignment and data harmonization. The alignment
module models each slice as a graph and employs optimal transport to align cells across
slices while preserving spatial continuity. The optimization module further refines
integration by incorporating an alignment loss, allowing lower-quality data to leverage
structural information from higher-quality slices. Additionally, MUSE generates virtual
neighbors from aligned cells, enriching contextual information and mitigating data
sparsity. These design principles enable seamless integration with existing single-slice
methods, extending their applicability to multi-slice ST analysis. To comprehensively
evaluate its performance, we applied MUSE to 12 real and 48 simulated datasets spanning
a range of data qualities. Across all metrics, MUSE consistently outperformed existing
methods in cross-slice consistency, spatial domain identification, and gene expression
imputation. To promote accessibility and adoption, we provide MUSE as an open-source
software package. As multi-slice ST datasets become increasingly prevalent, MUSE provides
a robust and extensible framework designed to effectively integrate growing numbers
of slices, thereby advancing the analysis of tissue architectures and spatial gene
expression in complex biological systems.
Hearable Image: On-Device Image-Driven Sound Effect Generation for Hearing What You
See
- Deokjun Eom
- Nahyun Kim
- Woohyun Nam
- Kyung-Rae Kim
- Chaebin Im
- Jungwon Park
There have been various studies in audio generation from image, text, or video. However,
the existing approaches have not consider on-device environment because audio generation
models are computationally expensive and require heavy storage capacity to save large
number of weights. In addition, it is difficult to get stable generation outputs because
unexpected results may occur depending on various model inputs. In image-to-audio
generation, there are diverse images in smartphones, and too many visual contexts
are contained in image features. Therefore, it is sometimes unpredictable which audio
categories are generated from images. In this paper, we propose a robust on-device
sound effect generation framework that is image-to-audio generation based on latent
diffusion. First, to avoid unstable and unpredictable audio generation results, we
propose a stable sound generation framework with Audio Feature Dictionary and Audio-Image
Matching Pipeline to generate sound effects from predefined sound effect categories.
If an image matches to sound effect categories, proposed framework directly generate
sound effects from audio features corresponding to the matched categories. Second,
we propose Multi-Category Generation and Generation Flow Map to generate robust and
diverse sound effects depending on audio categories. Using global and local features
of an image, we can select multiple categories of sound effects. Third, the framework
can be implemented in smartphone devices because we train the proposed model with
low computational cost and small number of model weights under 4-step latent diffusion
inference. Various experiments show the proposed framework solves on-device sound
generation problem with maintaining generation quality and audio-image matching performances
compared to large scale models. Our demo is available at: https://youtu.be/Y5HTr8wwqOA.
LLM-Enhanced Generalized Category Discovery via Iterative Graph Diffusion
- Kangjia Fan
- Yilong Zhao
- Daifeng Li
- Changze Lin
- Weijun Zhang
- Zhiwen Zhong
Generalized Category Discovery (GCD) aims to identify both known and unknown categories
from unlabeled data using limited labeled samples from known classes. The key challenges
lie in the scarcity of supervision signals for unknown categories and the difficulty
of modeling relationships among samples. Existing methods that rely solely on clustering
uncertainty often result in imperfect hard negative selection, while their use of
nearest-neighbor structures hinders the effective utilization of Large Language Model
(LLM) annotations for obtaining high-quality supervision. We propose a dynamic optimization
model (LIGD) that leverages diffusion graphs and LLM annotations to address these
issues. By leveraging the semantic-correlation graph, the method achieves two key
capabilities: it selects both hard negatives and unlabeled central samples likely
to represent novel categories as high-value samples. In addition, the graph enables
effective label propagation through its connected subgraph, significantly reducing
computational costs while enhancing the accuracy of category discovery. To further
enhance annotation quality, we introduce a two-stage prompting strategy that queries
the LLM twice to accurately assign selected samples to either existing or novel categories.
The entire process will be repeated iteratively until convergence to update the graph
structure and node representations in the graph. Experiments on three GCD datasets
demonstrate the significant superiority of LIGD. Most notably, in the challenging
scenario where only 25% of categories are labeled, the model achieves substantial
improvements while reducing the number of LLM queries by 50%. Code and data are available
at https://github.com/wdmmxlbt/LIGD.
Enhancing Multi-Behavior Sequential Recommenders with Behavior-Aware Regularization
- Yongfu Fan
- Jin Chen
- Yangzixuan Jiao
- Ximu Zeng
- Liwei Deng
- Kai Zheng
In the realm of multi-behavior sequential recommendation (MBSR), the complexity and
heterogeneity of user interactions pose substantial challenges for sequence modeling.
Existing studies involve significant efforts in combining different modules to learn
more expressive multi-behavior sequence representations or designing strategies to
extract user preferences related to the target behavior. Despite their effectiveness,
these methods neglect a thorough analysis of how behavioral information shapes the
probability distribution for next-item prediction, which is crucial for accurately
modeling user preferences. To this end, we first analyze the learning distribution
of MBSR, shedding light on the significance of target behavior in next-item prediction.
Building upon this insight, we propose a Behavior-Aware Regularization approach for multi-behavior sequential Rec ommendation (BAR4Rec), where we introduce a regularization loss function to preserve
the intrinsic constraints of target behavior. In this way, the target probability
distribution is extracted from the whole distribution and naturally evolves into a
more compatible and tractable form, thus facilitating model design and training. We
evaluate the proposed method on three real-world datasets, and the results validate
the efficacy of our approach.
MMFair: Fair Learning via Min-Min Optimization
- Kejie Fang
- Kun Zhai
- Xingjun Ma
Ensuring group fairness is crucial in applications like facial recognition, medical
image analysis, and online comment toxicity classification. A key challenge to achieving
group fairness arises from spurious correlations in datasets, where features used
by models for predictions are unrelated to the true labels. The widespread use of
large-scale pre-trained models as feature extractors, followed by fine-tuning for
downstream tasks, can exacerbate this issue. In particular, improper fine-tuning on
limited data often leads to overfitting, reinforcing spurious correlations and further
undermining group fairness. To address this, we propose MMFair, an algorithm that optimizes perturbations through a min-min optimization approach. These perturbations are applied to the deep embeddings, preventing the
model from associating irrelevant features with true labels, thus improving group
fairness. Notably, since simple linear classifiers are prone to spurious correlations,
we use a linear head in the initial stage to generate perturbations. After optimizing
the perturbations in the latent space, we incorporate them into the original embeddings
and then train a multi-layer perceptron (MLP) as the final classification head. This
two-stage approach helps mitigate the bias problem of linear head, while leveraging
the more powerful feature-learning capabilities of MLPs, leading to more stable and
accurate classification results. We evaluate MMFair on the Waterbirds, CelebA, and ISIC datasets. The results show that MMFair improves the accuracy of the worst-performing group efficiently.
Higher-Order Information Matters: A Representation Learning Approach for Social Bot
Detection
- Min Gao
- Qiang Duan
- Boen Liu
- Yu Xiao
- Xin Wang
- Yang Chen
Detecting social bots is crucial for mitigating the spread of misinformation and preserving
online conversation authenticity. State-of-the-art solutions typically leverage graph
neural networks (GNNs) to model user representations from social relationships and
metadata. However, these approaches overlook two key factors: the similarity of a
user and her neighbors, as well as the coordinated behaviors of social bots, resulting
in a suboptimal detection performance. To address these issues, we propose HyperScan,
a novel representation learning method for social bot detection. Specifically, we
introduce three effective learners to capture pair-wise, hop-wise, and group-wise
relations. HyperScan learns pair-wise user representations based on social relations
and user features. It then enhances user representations by building hop-wise interactions
across the learned pair-wise user representations for capturing the structure-level
proximity information. Subsequently, it models user representations by constructing
higher-order (group-wise) relations derived from user profiles, tweets, and social
relations to capture the feature-level proximity knowledge. By leveraging hop-wise
interactions and higher-order relations, HyperScan significantly improves bot detection
performance. Our extensive experiments demonstrate that HyperScan outperforms state-of-the-art
methods on three benchmark datasets. Additional studies validate the robustness and
effectiveness of each component of HyperScan.
DT-FedSDC: A Dual-Target Federated Framework with Semantic Enhancement and Disentangled
Contrastive Learning for Cross-Domain Recommendation
- Shanyang Gao
- Shanfeng Wang
- Lanyu Yao
- Jianzhao Li
- Zhao Wang
- Maoguo Gong
- Ke Pan
Federated cross-domain recommendation aims to alleviate the problem of data sparsity
and enable collaborative modeling of user behavior data from different platforms or
institutions while ensuring data privacy. Most existing federated cross-domain recommendation
methods rely on item IDs for modeling, ignoring the mining and utilization of item
semantic information. In addition, due to the heterogeneity of data between different
domains, the model is prone to domain bias and feature coupling problems during the
aggregation process, which negatively impacts the recommendation performance. This
paper proposes a dual-target federated cross-domain recommendation framework with
semantic enhancement and disentangled contrastive learning. First, to utilize semantic
information of items, item IDs features and text semantic features are jointly fused
to enhance the item embedding representations. Second, we propose a user representation
decoupling mechanism to explicitly decouple users preferences into shared and domain-specific
preferences, thereby alleviating domain bias and feature coupling problems. Furthermore,
we design a cross-domain contrastive learning module on the server side to enhance
the consistency and transferability of shared representations between user representations
across different domains. Experimental results show that the proposed algorithm performs
significantly better than existing optimal methods on multiple real-world datasets,
demonstrating its excellent performance in federated cross-domain recommendations.
Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate
Time Series Forecasting
- Yifan Gao
- Boming Zhao
- Haocheng Peng
- Hujun Bao
- Jiashu Zhao
- Zhaopeng Cui
Recent advances in deep learning have significantly boosted performance in multivariate
time series forecasting (MTSF). While many existing approaches focus on capturing
inter-variable (a.k.a. channel-wise) correlations to improve prediction accuracy,
the temporal dimension, particularly its rich structural and contextual information,
remains underexplored. In this paper, we propose BIM3, a novel framework that integrates BIdirectional temporal-aware modeling with Multi-Scale Mixture-of-Experts for MTSF. First, unlike existing methods that treat historical and future temporal information
independently, we introduce a novel Timestamp Dual Cross-Attention Module, which employs
a symmetric cross-attention mechanism to explicitly capture bidirectional temporal
dependencies through timestamp interactions. Second, to address the complex and scale-varying
temporal patterns commonly found in multivariate time series, we move beyond recent
multi-scale forecasting models that share parameters across all channels and fail
to capture channel-specific dynamics. Instead, we design a Multi-Scale Feature Extract
Mixture-of-Experts module that adaptively routes time series to specialized experts
based on their temporal characteristics. Extensive experiments on multiple real-world
datasets show that BIM3 consistently outperforms state-of-the-art methods, highlighting its effectiveness
in capturing both temporal structure and inter-variable diversity.
LangPTune: Optimizing Language-based User Profiles for Recommendation
- Zhaolin Gao
- Joyce Zhou
- Yijia Dai
- Thorsten Joachims
Recent works have shown increasing interest in using natural language-based user profiles
for recommender systems, as they offer greater transparency and interpretability compared
to traditional embedding-based methods. Most existing approaches rely on zero-shot
inference with large language models (LLMs) to generate these profiles, but the resulting
quality remains insufficient, leading to suboptimal recommendation performance. In
this paper, we present LangPTune, the first end-to-end training framework designed
to directly optimize LLM-generated user profiles for recommendation tasks. By explicitly
training the LLM for the recommendation objective, our approach significantly outperforms
zero-shot baselines. Evaluations across training setups and benchmarks show that LangPTune
not only exceeds the performance of zero-shot methods but also matches the performance
of state-of-the-art embedding-based baselines. Additionally, we assess whether our
training framework maintains the interpretability of user profiles, using both GPT-4
simulations and crowdworker studies.
High-Order Moments Conditional Domain Adaptation Networks for Wearable Human Activity
Recognition
- Indrajeet Ghosh
- Garvit Chugh
- Abu Zaher Md Faridee
- Nirmalya Roy
Developing scalable wearable human activity recognition (wHAR) models is challenging
due to domain shifts that substantially degrade performance across downstream tasks.
Unsupervised domain adaptation (UDA) seeks to improve generalization by transferring
knowledge from labeled source domains to unlabeled target domains. However, conventional
UDA methods primarily align marginal feature distributions while neglecting feature-label
dependencies, often leading to negative transfer and sub-optimal performance. Motivated
by these limitations, we propose a novel optimization framework that tackles two key
challenges: (i) generating reliable pseudo-labels for the unlabeled target domain
and (ii) minimizing conditional discrepancies across domains. To address (i), we employ
temperature-based entropy minimization (TEM), which calibrates prediction confidence by scaling logits with a temperature
parameter to produce robust pseudo-labels. For (ii), we introduce a polynomial kernel-based cross-covariance (PkCC) loss, a high-order statistics-driven approach that maps features into a reproducing
kernel hilbert space (RKHS) to capture richer feature-label dependencies and reduce
conditional distribution gaps between domains. In addition, we demonstrate that CoDAN readily extends to partial UDA (pUDA), where the target label space is a subset of
the source, and extensive evaluations on public wHAR datasets with diverse label spaces
validate its superior performance over state-of-the-art methods in both UDA and pUDA
scenarios.
PathLens: Structurally Enhancing Heterophilic Graphs for GNNs
- Karan Goyal
- Saankhya Samanta
- Vikram Goyal
- Mukesh Mohania
The notion that standard GNNs perform better on graphs with high homophily, led to
the development of specialised algorithms for heterophilic datasets in recent years.
In this work, we both examine and leverage this notion. Rather than creating new algorithms,
we emphasise the importance of understanding and enriching the data. We introduce
a novel data engineering technique, PathLens, that enhances the performance of both
heterophilic and non-heterophilic GNNs on heterophilic datasets. Our method structurally
augments a given heterophilic graph by adding supernodes, thereby creating a network
of pathways connecting spectral clusters in the graph. It facilitates additional paths
to bring similar nodes (intraclass) closer than dissimilar ones (interclass) by reducing
the average shortest path lengths. We draw both intuitive and empirical connections
between the relative decreases in intraclass and interclass average shortest path
lengths and shifts in the graph's homophily levels, providing a novel perspective
that extends beyond traditional homophily measures. We conduct extensive experiments
on seven diverse heterophilic datasets using various GNN architectures and also compare
with several data-centric techniques, demonstrating significant improvements as high
as 37% in node classification performance. Furthermore, our empirical findings highlight
the strong sensitivity of several recent GNNs with respect to the random seed used
for data splitting, underscoring this often-overlooked factor in GNN evaluation. The
code will be available at https://github.com/goyalkaraniit/PathLens.
PERC: A Prior-Guided Framework for Classifying Long-Content Educational Resources
with Imbalanced Category Distributions
- Quanlong Guan
- Xiuliang Duan
- Zhi Chen
- Xingyu Zhu
- Jianbo Huang
- Xinzhong Liu
- Zonglin Liu
- Liangda Fang
With the rapid growth of online education, the types and volumes of educational resources
have increased significantly. Efficient classification of these resources can substantially
reduce manual workload and enhance management effectiveness. However, existing models
often struggle to accurately classify long-content educational content, particularly
under imbalanced category distributions. To address these challenges, we propose PERC,
a prior-guided framework for classifying long educational resources with imbalanced
categories. To the best of our knowledge, PERC is the first framework to incorporate
the foundational cognitive dimensions of Bloom's Taxonomy into educational resource
classification. First, PERC leverages standardized pedagogical classification guidelines
and maps the original label space into a semantically structured prior category space
using a Structured State-Space Learning framework. Second, to handle the length and
high information density of educational texts, we introduce a Dynamic Sliding Window
Attention mechanism that captures both local and partial global dependencies, enabling
the extraction of compact, semantically rich representations. Finally, a category-aware
classifier integrates the prior representation of each category with the semantic
representation of the resource to produce a category-aware embedding for final prediction.
To evaluate PERC, we constructed two datasets: EduMix-24 and EduMath-24, comprising
18,799 educational resources manually annotated across 9 lesson types, 15 teaching
modes, and 9 activity elements. Classification experiments on all three tasks consistently
demonstrate that PERC outperforms state-of-the-art baselines.
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model
Editing
- Dongliang Guo
- Mengxuan Hu
- Zihan Guan
- Junfeng Guo
- Thomas Hartvigsen
- Sheng Li
Large pre-trained models have achieved notable success across a range of downstream
tasks. However, recent research shows that a type of adversarial attack (i.e., backdoor attack) can manipulate the behavior of machine learning models through contaminating
their training dataset, posing significant threat in the real-world application of
large pre-trained model, especially for those customized models. Therefore, addressing
the unique challenges for exploring vulnerability of pre-trained models is of paramount
importance. Through empirical studies on the capability for performing backdoor attack
in large pre-trained models (e.g., ViT), we find the following unique challenges of attacking large pre-trained models:
1) the inability to manipulate or even access large training datasets, and 2) the
substantial computational resources required for training or fine-tuning these models.
To address these challenges, we establish new standards for an effective and feasible
backdoor attack in the context of large pre-trained models. In line with these standards,
we introduce our EDT model, an Efficient, Data-free, Training-free backdoor attack method. Inspired by model editing techniques, EDT injects
an editing-based lightweight codebook into the backdoor of large pre-trained models,
which replaces the embedding of the poisoned image with the target image without poisoning
the training dataset or training the victim model. Our experiments, conducted across
various pre-trained models such as ViT, CLIP, BLIP, and stable diffusion, and on downstream
tasks including image classification, image captioning, and image generation, demonstrate
the effectiveness of our method. Our code is available at https://github.com/donglgcn/Editing/
Twin-Flow Generative Ranking Network for Recommendation
- Hao Guo
- Erpeng Xue
- Lei Huang
- Shichao Wang
- Xiaolei Wang
- Lei Wang
- Jinpeng Wang
- Zeshun Li
- Sheng Chen
Deep Learning Recommendation Models (DLRMs) often rely on extensive manual feature
engineering to improve accuracy and user experience, which increases system complexity
and limits scalability of model performance with respect to computational resources.
Recently, Meta introduced a generative ranking paradigm based on HSTU block that enables
end-to-end learning from raw user behavior sequences and demonstrates scaling law
on large datasets that can be regarded as the state-of-the-art (SOTA). However, splitting
user behaviors into interleaved item and action information significantly increases
the input sequence length, which adversely affects both training and inference efficiency.
To address this issue, we propose the Twin-Flow Generative Ranking Network (TFGR),
that employs a Twin-flow mechanism to optimize interaction modeling, ensuring efficient
training and inference through end-to-end token processing. TFGR duplicates the original
user behavior sequence into a real flow and a fake flow based on the authenticity
of the action information, and then defines a novel interaction method between the
real flow and the fake flow within the QKV module of the self-attention mechanism.
This design reduces computational overhead and improves both training efficiency and
inference performance compared to Meta's HSTU-based model. Experiments on both open-source
and real industrial datasets show that TFGR outperforms DLRM, which serves as the
industrial online baseline with extensive feature engineering, as well as Meta's HSTU
and other common recommendation models such as DIN, DCN, DIEN, and DeepFM. Furthermore,
we investigate optimal parameter allocation strategies under computational constraints,
establishing TFGR as an efficient and effective next-generation generative ranking
paradigm.
When Variety Seeking Meets Multi-Sided Recommendation Fairness: A Consistent and Personalized
Multi-Objective Optimization Framework
- Jiayi Guo
- Jiangning He
- Chenyan Wang
- Xinran Wu
Recommendation research has evolved from solely improving accuracy to addressing ethical
and fairness concerns. While prior works focus on optimizing fairness from either
the user or product perspective, recent research emphasizes the importance of multi-sided
fairness. This issue is inherently challenging due to the competing goals of different
stakeholders. To tackle this challenge, we propose a Consistent and Personalized Fairness
Recommendation framework with Multi-Objective Integer Programming (CPFR-MOIP). Our
framework introduces two key innovations. First, we develop a novel similarity-based
individual fairness metric for the user side and formulate a consistent product-side
fairness metric, ensuring that the generated recommendation list aligns with the user
preference distribution and the expected product exposure distribution. Second, we
incorporate users' variety-seeking levels as a moderating factor to adjust fairness
trade-offs and introduce personalized weights to balance user-side and product-side
fairness. To effectively solve this optimization problem, we devise an alternating
algorithm with theoretical guarantee and demonstrate the Pareto optimality of the
obtained solutions. Extensive experiments on two real-world datasets demonstrate that
our CPFR-MOIP achieves superior multi-sided fairness while maintaining competitive
recommendation accuracy. Furthermore, ablation analysis highlights the advantages
of incorporating user variety-seeking levels for personalizing fairness trade-offs.
Our work paves the way for more ethical and personalized recommendation systems. The
implementation code is available at: https://github.com/P0ise-Wang/CPFR-MOIP.
Proto-Yield: An Uncertainty-Aware Prototype Network for Yield Prediction in Real-world
Chemical Reactions
- Kehan Guo
- Zhen Liu
- Zhichun Guo
- Bozhao Nan
- Olexandr Isayev
- Nitesh Chawla
- Olaf Wiest
- Xiangliang Zhang
Reaction yield prediction underpins computer-aided synthesis prediction (CASP). Formulated
as a regression problem that takes both reactants and products as input, this task
has been extensively studied using machine learning methods, based on handcrafted
fingerprint features, SMILES encoded by Transformers, and molecular graphs encoded
by Graph Neural Networks. However, a major limitation of these methods is their inability
to effectively capture and model the underlying uncertainties, arising both from the
inherently stochastic nature of chemical reaction processes and from inconsistencies
or noise in how yields are measured and reported. What makes this seemingly simple
regression problem even more challenging is the lack of any principled way to account
for the underlying uncertainties, due to missing or unrecorded experimental process
(commonly happens in chemical labs).
Given these challenges, we propose a new formulation for yield prediction. Rather
than assuming a single deterministic yield value for a given reaction, we model the
outcome as a probabilistic distribution over three discrete yield regimes: high, medium,
and low, reflecting the inherent uncertainty in the reaction process, which is often
only partially observed. Accordingly, we propose Proto-Yield, an encoder-agnostic prototype network that models reactions as occurring in one
of three yield regimes: high, medium, or low. Without access to full reaction processes,
Proto-Yield learns to infer latent regimes and their associated yield distributions
from noisy, incomplete training data. During inference, Proto-Yield outputs both a
calibrated probability distribution over the yield regimes and the predicted yield
conditioned on each regime. Extensive experiments on a 41,000-reaction patent corpus
and two high-throughput benchmarks show that Proto-Yield improves R2 by up to 15% and reduces RMSE/MAE by 13% compared to baseline methods.
KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation
- Farnoosh Hashemi
- Laks V.S. Lakshmanan
Digital maps play a crucial role in various applications such as navigation, fleet
management, and ride-sharing, necessitating their accuracy and currency, which require
timely updates. While the majority of geospatial databases (GDBs) provide high-quality
information, their data is (i) limited to specific regions and/or (ii) missing some
entities, even in their covered areas. Map conflation is the process of augmentation
of a GDB using another GDB to conflate missing spatial features. Existing map conflation
methods suffer from two main limitations: (1) They are designed for the conflation
of linear objects (e.g., road networks) and cannot simply be extended to non-linear
objects, thus missing information about most entities in the map. (2) They are heuristic
algorithmic approaches that are based on pre-defined rules, unable to learn entities
matching in a data-driven manner. To address these limitations, we design KRAFT, a
learning based approach consisting of three parts: (1) Knowledge Graph Construction - where each GDB is represented by a knowledge graph, (2) Map Matching - where we use a knowledge graph alignment method as well as a geospatial feature
encoder to match entities in obtained knowledge graphs, and (3) Map Merging - where we merge matched entities in the previous modules in a consistent manner,
using a mixed integer linear programming formulation that fully merges the GDBs without
adding any inconsistencies. Our experimental evaluation shows that not only does KRAFT
achieve outstanding performance compared to state-of-the-art and baseline methods
in map conflation tasks, but each of its modules (e.g., Map Matching and Map Merging)
also separately outperforms traditional matching and merging methods.
Addressing the Distortion of Community Representations in Anomaly Detection on Attributed
Networks
- Enbo He
- Yitong Hao
- Yue Zhang
- Guisheng Yin
- Lina Yao
Anomaly detection on attributed networks, especially in the unsupervised scenario,
has garnered significant attention. And the Contrastive Learning (CL)-based methods
have emerged as one of the state-of-the-art approaches for this task. However, existing
CL-based methods face a critical challenge: anomalous nodes infiltrate the sampled
local communities, leading to the distortion of community representation which fundamentally
limits the discriminative ability. Our theoretical analysis reveals that this distortion
is caused by two main mechanisms: the cross contamination and the aggregation bias.
And the key oversight is to treat all community members equally and ignore the relative
reliability of nodes. To address these issue, we propose a CL-based ANomaly detectIon
Method on Attributed networks targeted at mitigating community distortions to enhance
anomaly discrimination (ANIMA for short), which incorporates a Truncation-Restriction
community encoder (TRC-Encoder) with an elaborate heuristic prior instruction to detect
and suppress anomalous contributions during community representation learning. Comprehensive
experiments on 7 datasets demonstrate that ANIMA outperforms 10 SOTA methods by 2.25-8.8%
AUC, validating the effectiveness of our approach in mitigating community distortions
and enhancing anomaly discrimination.
EEG-FSL: An EEG-Based Few-Shot Learning Framework for Music Recommendation
- Ming He
- Wenbo Luo
- Yongjie Zheng
- Junkai Zhang
- Xiaolei Gao
Brain-computer interface based on electroencephalogram (EEG) has demonstrated significant
potential for capturing users' implicit preferences, offering an innovative technique
for music recommendation. However, we face two key challenges: (1) ineffective distinction
of complex neural patterns in EEG signals, and (2) the cold-start problem, due to
limited user EEG samples. To address these issues, we present EEG-FSL, a novel framework
that integrates model-agnostic meta-learning (MAML) with dual-path neural feature
extraction for music recommendation. EEG-FSL applies an attention-enhanced EEG encoder
to extract meaningful patterns from brain signals through complementary pathways:
one pathway retains temporal and phase information, while the other focuses on extracting
common frequency-domain features. Furthermore, we utilize contrastive learning to
explore the intrinsic structure of the data, significantly improving the model's feature
differentiation ability. Additionally, we propose a meta-learning method which allows
EEG-FSL to quickly adapt to new users using only a small number of EEG samples, effectively
solving cold-start problem. Extensive experiments are conducted on a real-world dataset
demonstrate the effectiveness of the proposed method. Specially, in few-shot scenarios,
compared to the best baseline, our approach improves mean squared error in score prediction
by 8.4% and classification accuracy by 16.8%. Consequently, our work provides a practical
solution for next-generation brain-computer interface applications, capable of delivering
highly personalized content recommendations while minimizing user data collection
requirements. Our code is available at https://anonymous.4open.science/r/EEG-FSL-code-72F3/.
Improving the Safety of Medication Recommendation via Graph Augmented Patient Similarity
Network
- Ming He
- Yongjie Zheng
- Changle Li
- Man Zhou
Recommending optimal medication combinations for patients is a crucial application
of artificial intelligence in healthcare. Recent works typically use patients' electronic
health record combined with their current health conditions. However, these efforts
have the following issues: 1) they often reference historical visits unrelated to
the current situation, and 2) there is a latent risk of side effects from historical
prescriptions. Such issues raise concerns about the safety of medication recommendation.
To address this, we propose GPSRec, a novel Graph augmented Patient Similarity network
for medication Recommendation. By leveraging dual similarity measures to selectively
integrate historical visits, GPSRec effectively filters out irrelevant information,
improving the accuracy of recommendation. We further present a training strategy,
which combines a pre-training method and a dual threshold loss adjustment, reduces
the risk of adverse drug-drug interactions, enhancing the safety of recommendation.
Extensive experiment results on two real datasets demonstrate that GPSRec significantly
outperforms state-of-the-art methods. Notably, it achieves 30.11% and 24.92% improvements
in safety, respectively, with higher accuracy.
LinkGPT: Leveraging Large Language Models for Enhanced Link Prediction in Text-Attributed
Graphs
- Zhongmou He
- Jing Zhu
- Shengyi Qian
- Joyce Chai
- Danai Koutra
Inspired by the success of Large Language Models (LLMs) in language and vision tasks,
there has been growing interest in applying LLMs to graph tasks, particularly on Text-Attributed
Graphs (TAGs). However, most prior work tackles the node classification task. In this
work, we evaluate an LLM's ability to reason over structured data and infer new facts
based on learned patterns by focusing on link prediction (LP)-the task of predicting
missing links between nodes-that is understudied in the literature. This task poses
two key challenges: (1) How to effectively integrate pairwise structural information,
which is crucial for LP performance, into LLMs, and (2) how to address the computational
bottleneck during inference. To tackle these challenges, we propose LinkGPT, the first LLM-based training and inference framework specifically designed for LP
on homogeneous TAGs. To enhance the LLM's ability to understand the underlying structure,
we carefully design a node encoder and pairwise encoder, and leverage a two-stage
instruction tuning to effectively incorporate the nodewise and pairwise information
into LLMs. For inference efficiency, we introduce a retrieval-reranking scheme. Extensive
experiments show that LinkGPT achieves state-of-the-art performance on real-world graphs and demonstrates superior
zero-shot and few-shot generalization. At inference time, it achieves a 10× speedup
while maintaining high LP accuracy.
FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection
Multi-step or hybrid deepfakes, created by sequentially applying different deepfake
creation methods such as Face-Swapping, GAN-based generation, and Diffusion methods,
can pose an emerging and unforseen technical challenge for detection models trained
on single-step forgeries. While prior studies have mainly focused on detecting isolated
single manipulation, little is known about the detection model behavior under such
compositional, hybrid, and complex manipulation pipelines. In this work, we introduce
FakeChain, a large-scale benchmark comprising 1-, 2-, and 3-Step forgeries synthesized using
five state-of-the-art representative generators. Using this approach, we analyze detection
performance and spectral properties across hybrid manipulation at different step,
along with varying generator combinations and quality settings. Surprisingly, our
findings reveal that detection performance highly depends on the final manipulation
type, with F1-score dropping by up to 58.83% when it differs from training distribution. This clearly demonstrates that detectors
rely on last-stage artifacts rather than cumulative manipulation traces, limiting
generalization. Such findings highlight the need for detection models to explicitly
consider manipulation history and sequences. Our results highlight the importance
of benchmarks such as FakeChain, reflecting growing synthesis complexity and diversity
in real-world scenarios. Our sample code is available here. https://github.com/minjihh/FakeChain.
LeadFairRec: LLM-enhanced Discriminative Counterfactual Debiasing for Two-sided Fairness
in Recommendation
- Yimin Hou
- Yue Kou
- Derong Shen
- Xiangmin Zhou
- Dong Li
- Tiezheng Nie
- Ge Yu
Fairness-aware recommendation has emerged as a pivotal research area in recent years.
Current fairness studies primarily examine two independent dimensions: user-side fairness
and item-side fairness. However, most approaches address each side's fairness in isolation
while neglecting their complex interdependencies. In this paper, we propose an LLM-Enhanced
DiscriminAtive Counterfactual Debiasing Model for Two-sided Fairness in Recommendation
(LeadFairRec). Specifically, we first design a two-sided causal graph that jointly
models provider-customer fairness interactions through their causal relationships.
Then we propose a discriminative counterfactual debiasing method, which effectively
removes spurious correlations while maintaining true user-item interactions. Finally,
we propose an LLM-enhanced counterfactual inference method to derive noise-resistant
user/item representations from interaction data, enhancing the robustness of causal
debiasing. The experimental results demonstrate the high effectiveness of our proposed
model. We provide our code at https://github.com/houyimin660/LeadFairRec.
Model-Agonistic Iterative Graph Diversification for Improving Learning to Solve Graph
Optimization Problems
- Bay-Yuan Hsu
- Chia-Hsun Lu
- Chih-Ya Shen
A recent line of research on learning to solve graph optimization problems has attracted
much research attention. However, most recent machine learning approaches to graph
optimization problems usually employ graph generators to randomly generate training
graphs, which may lead to overfitting and deteriorate the model's generalization.
To tackle this issue, we observe that enhancing the diversity of training graphs is
a crucial factor in improving the model's performance. Therefore, in this paper, we
formulate a new research problem, named Graph Augmentation for Diversity Maximization (GRAM), to maximize the training graph diversity by performing graph modifications. We first
analyze the NP-hardness of GRAM. We then propose a 2-approximation algorithm and formally
analyze its performance guarantee. Experimental results on well-known graph optimization
problems show that our proposed approach significantly outperforms the baselines,
such as graph augmentation and deep learning-based graph generation approaches.
GRIT: An Accurate and Efficient Graph Stream Summarization for Temporal Query
- Jingxian Hu
- Guozhang Sun
- Xin Wang
- Yuhai Zhao
- Yuan Li
- Xingwei Wang
Graph stream summarization refers to the technique used to process graph streams-unbounded
sequences of edges-by constructing compressed representations that support approximate
queries on both graph topology and temporal information in computing power networks.
However, existing methods struggle to achieve accurate and efficient temporal queries
due to two key limitations: (1) inefficient integration of temporal information, leading
to high latency in both edge processing and query execution; and (2) redundant multilayer
structures that accumulate errors, significantly reducing query accuracy. In this
paper, we propose GRIT, an accurate and efficient Graph stReam summarIzation for Temporal query. GRIT introduces a new structure FlatIndex, which organizes temporal
information in a flattened form, playing a critical role in minimizing error accumulation
and ensuring accurate temporal queries. To further enhance edge processing efficiency,
we introduce a lazy update strategy, which updates only a single element in the FlatIndex
upon edge insertion, significantly reducing insertion latency. Moreover, our greedy-based
decomposition (GBD) algorithm decomposes the target query range into the minimal number
of intervals corresponding to the FlatIndex, enabling efficient execution of temporal
queries over arbitrary time ranges. Extensive experiments on five real-world datasets
demonstrate that GRIT improves query accuracy by 2-3 orders of magnitude, while reducing
query latency by 1-2 orders of magnitude and increasing throughput by 7-13 times compared
to state-of-the-art methods.
Distributed Computation of k-Vertex Connected Components in Large Scale Networks
- Xinchao Hu
- Yuan Li
- Feng Guo
- Shan Huang
- Guoli Yang
- Yuhai Zhao
Recently, k-vertex connected component (k-VCC) detection has gained significant attention in graph analysis owing to its ability
to capture structural cohesion. A k-VCC remains connected even after the removal of any k-1 vertices from itself. The
k-VCC has broad applications across multiple domains, such as social network analysis,
cybersecurity, and bioinformatics. Yet, the existing exact k-VCC detection algorithms require repeated computation of minimum vertex cuts, imposing
a prohibitive computational cost for large scale graphs. In this paper, we present
an approximate algorithm for k-VCC detection that leverages Monte Carlo sampling to accelerate minimum vertex cut
computation with theoretical guarantee. Further, we design a distributed algorithm
for mining all k-VCCs, named DkVCC. DkVCC adopts a divide-and-conquer strategy, decomposing the problem
into smaller subgraph mining tasks that can be executed concurrently. Specifically,
we generate tasks from individual vertices to construct initial subgraphs, and then
iteratively expand and merge the subgraphs to form the final k-VCCs. Extensive experiments on 5 large real datasets demonstrates the efficiency
of our proposed algorithms. For example, we achieve 4× runtime speedup on the LiveJournal
dataset with 3.99M vertices and 34.7M edges in a 3-node cluster.
Mixture of Semantic and Spatial Experts for Explainable Traffic Prediction
- Yang Hu
- Shaobo Li
- Dawen Xia
- Zhiheng Zhou
- Wenyong Zhang
- Huaqing Li
- Xingxing Zhang
- Senzhang Wang
To satisfy the growing demand for traffic prediction induced by urbanization, the
intelligent transportation system integrated various cutting-edge artificial intelligence
technologies, with large language models (LLMs) as a representative, has been developed.
However, existing methods are mostly confined by shallow LLMs utilization, where the
semantic capacity of LLMs is ignored and the traffic data are directly fed in. Furthermore,
the modality diversity of different traffic prediction scenarios (e.g., flow, speed,
and demanding) remains to be underexplored, which restricts the model flexibility
towards downstream applications. To mitigate these limitations, we propose a Mixture
of Semantic and Spatial Experts (SS-MoE) for traffic prediction along with the human-intelligible
post-hoc result explanation. Specifically, to enlighten the traffic predictor with
abundant semantic information, we design hierarchically coarse- and fine-grained prompts
including role assignments, dataset descriptions, and background supplements, which
serves as the auxiliary knowledge for downstream prediction. Afterwards, considering
the diversity of real-world traffic scenarios, we construct the MoE framework consisting
of a spatial expert, a semantic expert, and a general expert, which accounts for the
node-level features, the semantic representations, and the overall generalization,
respectively. At last, we instruct the LLM to explain and analyze the final prediction,
which is able to provide insightful conclusions and support intelligent transportation
decisions, forming a unified prediction-explanation pipeline. Extensive experiments
on five public traffic datasets demonstrate the superiority of SS-MoE across three
traffic prediction tasks. Experimental results indicate that the MAE and RMSE values
of SS-MoE are reduced by up to 4.04% and 3.20% compared with that of the runner-up,
respectively.
Revisiting the Inner Product Method: Optimizing Sparse Matrix Multiplication via Set
Intersection
- Zheng Hu
- Boyu Yang
- Weiguo Zheng
Sparse matrices are extensively used to model interactions between entities and facilitate
computations in neural networks. Sparse Matrix Multiplication (SpGEMM) serves as a
fundamental operation in graph algorithms, social network analysis, and deep learning,
attracting considerable research interest. Among the four primary paradigms for defining
sparse matrix multiplication, the Inner Product (IP) method most closely aligns with
the standard definition of matrix multiplication. However, due to its limited data
reuse and reliance on index matching, the IP method has been rarely explored in the
literature. This paper investigates the strong connection between SpGEMM and set intersection
computation, introducing a hybrid sparse matrix multiplication algorithm that builds
upon the numerical computation of the IP method. By leveraging the IP method's advantages-such
as minimal intermediate results and high flexibility-our approach effectively enhances
computational efficiency. Experimental evaluations on benchmark datasets demonstrate
the superiority of the proposed algorithm, particularly in scenarios where the resulting
matrix exhibits high sparsity. Furthermore, our method proves effective in several
applications, including self-transpose multiplication and sparse matrix multiplications
in graph neural networks.
CS-Agent: LLM-based Community Search via Dual-agent Collaboration
- Jiahao Hua
- Long Yuan
- Qingshuai Feng
- Qiang Fan
- Shan Huang
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural
language processing tasks, yet their application to graph structure analysis, particularly
in community search, remains underexplored. Community search, a fundamental task in
graph analysis, aims to identify groups of nodes with dense interconnections, which
is crucial for understanding the macroscopic structure of graphs. In this paper, we
propose GraphCS, a comprehensive benchmark designed to evaluate the performance of
LLMs in community search tasks. Our experiments reveal that while LLMs exhibit preliminary
potential, they frequently fail to return meaningful results and suffer from output
bias. To address these limitations, we introduce CS-Agent, a dual-agent collaborative
framework to enhance LLM-based community search. CS-Agent leverages the complementary
strengths of two LLMs acting as Solver and Validator. Through iterative feedback and
refinement, CS-Agent dynamically refines initial results without fine-tuning or additional
training. After the multi-round dialogue, Decider module selects the optimal community.
Extensive experiments demonstrate that CS-Agent significantly improves the quality
and stability of identified communities compared to baseline methods. To our knowledge,
this is the first work to apply LLMs to community search, bridging the gap between
LLMs and graph analysis while providing a robust and adaptive solution for real-world
applications.
Geometric Heterogeneous Graph Neural Network for Protein-Ligand Binding Affinity Prediction
- Feng Huang
- Yuhang Xia
- Ziyan Wang
- Liuqing Yang
- Wen Zhang
Accurately predicting protein-ligand binding affinity (PLA) remains a critical challenge
in structure-based drug discovery. Recent advances have focused on using geometry-aware
graph neural networks to model the three-dimensional (3D) structure of protein-ligand
complexes for PLA prediction. However, they still achieve suboptimal performance due
to two potential issues. 1) Representing the protein-ligand complex as a homogeneous
graph ignores the inherent difference between intra- and intermolecular interactions,
limiting the expressive ability of the models. 2) Given that the geometric complementarity
between the ligand and protein binding pocket serves as a fundamental determinant
of binding strength, the incomplete exploitation of geometric information constrains
the predictive performance. In this study, we propose a novel Geometric Heterogeneous
Graph Neural Network (GeoHGN) for PLA prediction. Specifically, we consider complete
geometries to characterize the directions of edges in coordinate space through quantum
inspired basis functions. To sufficiently incorporate the 3D information and heterogeneous
topology of the complexes, we elaborately design a novel heterogeneous directional
message passing mechanism (HDMP), which enables the propagation and aggregation of
messages from intra- and intermolecular neighbors along with the directional information
of linked edges. Extensive benchmarking experiments demonstrate the superiority of
GeoHGN in predicting PLA.
Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient
Diffusion Models
- Hongtao Huang
- Xiaojun Chang
- Lina Yao
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity
images but are constrained by high computational costs due to iterative multi-step
inference. While Neural Architecture Search (NAS) can optimize DMs, existing methods
are hindered by retraining requirements, exponential search complexity from step-wise
optimization, and slow evaluation relying on massive image generation. To address
these challenges, we propose Flexiffusion, a training-free NAS framework that jointly
optimizes generation schedules and model architectures without modifying pre-trained
parameters. Our key insight is to decompose the generation process into flexible segments
of equal length, where each segment dynamically combines three step types: full (complete
computation), partial (cache-reused computation), and null (skipped computation).
This segment-wise search space reduces the candidate pool exponentially compared to
step-wise NAS while preserving architectural diversity. Further, we introduce relative
FID (rFID), a lightweight evaluation metric for NAS that measures divergence from
a teacher model's outputs instead of ground truth, slashing evaluation time by over
90%. In practice, Flexiffusion achieves at least 2× acceleration across LDMs, Stable
Diffusion, and DDPMs on ImageNet and MS-COCO, with FID degradation under 5%, outperforming
prior NAS and caching methods. Notably, it attains 5.1× speedup on Stable Diffusion
with near-identical CLIP scores. Our work pioneers a resource-efficient paradigm for
searching high-speed DMs without sacrificing quality.
Enhancing Multimodal Entity Linking via Distillation and Multimodal Large Language
Models
- Jintao Huang
- Dong Wang
- Shasha Li
- Yuanxi Peng
- Ruochun Jin
Multimodal entity linking (MEL) aims to link ambiguous multimodal mentions to their
corresponding entities in a multimodal knowledge graph. Although many existing methods
have been dedicated to exploring fine-grained intra- and cross-modal interactions
between mentions and entities and have achieved good results, the discrepancies between
the data distributions in training and real-world applications, as well as the noisy
onehot labels, still impede the generalization of MEL models, which leads to poor
performance when encountering unseen entities. Although general-purpose multimodal
large language models (MLLMs) are powerful, it is costly and time-consuming to apply
them directly to the MEL task. To address the above issues, we propose a Distillation-Enhanced framework for Multimodal Entity Linking (DEMEL). During training, DEMEL takes the best-trained MEL model so far as the teacher model,
and distills the knowledge of the teacher model into the student model, i.e. the MEL
model of the current iteration, when training it with onehot labels. This imposes
regularization on the model, balances bias and variance in the training process, and
improves the generalization ability of the MEL model. Moreover, DEMEL employs an MLLM
to selectively rerank predictions for uncertain samples in the inference phase, improving
accuracy while minimizing invocation costs. Extensive experiments on three public
MEL datasets demonstrate that DEMEL outperforms state-of-the-art baselines, achieving
3.27% improvement with MLLM reranking for just 8.59% of test samples, and up to 4.8%
H@1 enhancement in low-resource settings even without using MLLM reranking.
Hyperspherical Dynamic Multi-Prototype with Arguments Dependencies and Role Consistency
for Event Argument Extraction
- Xiaojia Huang
- Ruifang He
- Fei Huang
- Bo Wang
- Sen Yao
- Xiaohong Li
Event Argument Extraction (EAE) aims to identify arguments and assign them to predefined
roles within a document. Existing methods face challenges in modeling intra-class
variance and inter-class ambiguity, hindering accurate role assignment. Inspired by
how humans dynamically adjust classification criteria while maintaining category consistency
(e.g., distinguishing ''Victim'' and ''Attacker'' roles based on contextual relationships),
we propose a HDMAR (Hyperspherical Dynamic Multi-Prototype with Arguments Dependencies and Role Consistency) method, where three innovations tackle these challenges: (1) Hyperspherical dynamic
multi-prototype learning is used to capture intra-role diversity and enforce inter-role
separation via hyperspherical optimization and optimal transport, (2) cross-event
role consistency is used to align role representations across events, and (3) an arguments
dependencies-guided encoding module enhances contextual understanding of intra-event
and inter-event dependencies. Experiments on RAMS and WikiEvents demonstrate gains
in accuracy, with further analysis validating the contributions of each module.
OFIA: An Object-centric Fine-grained Alignment Enhancement for Video-Text Retrieval
- Zhengqi Huang
- Wei Li
- Chuang Dong
- Mingxin Liu
Text-video alignment is crucial for text-video retrieval. Empirical studies have suggested
that coarse-grained alignment overlooks the rich cross-modal details and therefore
performs worse than fine-grained alignment. In the current transfer learning paradigm,
although researchers primarily utilize patch-level or frame-level embedding as the
fine-grained video representation for alignment, the frame embeddings miss the informative
visual information, while a single patch only captures limited local details. These
defects hinder the potential improvement in video-text retrieval. To address the defects
of the existing fine-grained alignment approach, this paper proposes the Object-centric
Fine-grained Alignment Enhancement for Video-Text Retrieval, namely OFIA, which consists
of a text-guided object-text alignment module and a similarity-wise frame aggregation
module to enhance video-text alignment. The text-guided object-text alignment module
leverages the textual descriptions to detect and extract more relevant objects, enabling
more precise local similarities within video frames. However, not all frames contribute
equally to effective alignments. The similarity-wise frame aggregation module assigns
greater importance to informative frames, overcoming the challenge of insignificant
ones and optimizing the similarity of matched video-text pairs. The empirical evaluation
on benchmark datasets including MSRVTT, MSVD, DiDeMo, and ActivityNet demonstrate
state-of-the-art performance of the proposed method.
Unlocking the Potential of Smaller Language Models as Superior Instruction Evolvers
- Tingfeng Hui
- Lulu Zhao
- Guanting Dong
- Yaqi Zhang
- Sen Su
Instruction tuning has become a cornerstone for unlocking the full potential of large
language models. Among the key factors, complex and diverse instructions play a crucial
role in aligning these models with a wide range of downstream tasks. However, current
methodologies for constructing large-scale instruction datasets tend to favor powerful
models, such as GPT-4, based on the empirical assumption that larger models inherently
possess superior capabilities. In this study, we challenge this prevailing assumption
and delve into the untapped potential of smaller language models (SLMs) in the context
of instruction evolution. Through extensive experiments across three distinct scenarios
of instruction evolution, we find that SLMs can generate more effective instructions
compared to their larger counterparts. Further analysis reveals that SLMs exhibit
a broader output space during instruction evolution, leading to the creation of more
complex and diverse instructional variants. Additionally, we observe that existing
evaluation metrics fall short in capturing the nuanced impact of instructions. To
address this limitation, we propose Instruction Complex-Aware IFD (IC-IFD), an enhanced
framework that incorporates instruction complexity into the original IFD score. This
approach enables a more accurate assessment of the effectiveness of instruction data,
paving the way for more refined instruction tuning strategies.
Scaling Trust: Veracity-Driven Defect Detection in Entity Search
- Ornella Irrera
- Stefano Marchesin
- Gianmaria Silvello
- Omar Alonso
Veracity is a critical dimension of data quality that directly impacts a wide range
of tasks. In entity search scenarios, Knowledge Graphs (KGs) such as DBpedia and Wikidata
serve as core resources for accessing factual content. The veracity of these KGs is
therefore essential for ensuring the reliability and trustworthiness of retrieved
entities -- factors that directly influence user confidence in the search system.
However, ensuring the truthfulness of entities remains a major challenge due to the
complexities associated with the scale, development, and maintenance of KGs.
This paper critically analyzes the impact of veracity in entity search, using DBpedia
as the underlying KG. To this end, we introduce eRank, a veracity-driven re-ranking
strategy that enhances entities' trustworthiness without sacrificing the ranking's
overall relevance. Furthermore, we propose the Active Learning-based verAcity-Driven
Defect IdentificatioN (ALADDIN) system, a lightweight and scalable framework for veracity-driven
defect detection. ALADDIN identifies incorrect KG facts and exhibits high effectiveness
in downstream entity-centric tasks, such as entity summarization, entity card generation,
and defect recommendation.
Deep Modality-Disentangled Prompt Tuning for Few-Shot Multimodal Sarcasm Detection
- Soumyadeep Jana
- Abhrajyoti Kundu
- Sanasam Ranbir Singh
The growing use of multimodal content on social media has sparked interest in sarcasm
detection for better opinion mining. However, current models depend heavily on large
datasets, limiting their adaptability to real-world scenarios with limited labeled
data. Therefore, an in-depth exploration of the problem in a few-shot setting is necessary.
We propose DMDP (Deep Modality-Disentangled Prompt Tuning), a novel approach designed for few-shot multimodal sarcasm detection.
Previous few-shot approaches relied on shallow, unified prompts across modalities,
limiting their ability to model the nuanced and diverse nature of sarcasm. In contrast,
we propose gated modality-weighted dual prompts that are disentangled across text
and visual encoders, injected at deeper layers to enable hierarchical feature aggregation
and identify distinct sarcasm cases. A prompt-sharing mechanism across the layers
of each encoder facilitates the capture of low-level cues to high-level semantics,
while a cross-modal prompt alignment facilitates subtle visual-textual interactions,
allowing the model to capture complex sarcastic cues better. Extensive experiments
on two public datasets show the superiority of our model over baselines in the few-shot
and extremely low-resource scenarios. To further validate our model's effectiveness,
we conduct a cross-dataset evaluation on two public datasets, where it consistently
outperforms baselines, highlighting strong generalization. Our code will be available
at https://github.com/mr-perplexed/dmdp.
Streamlining Feature Interactions via Selectively Crossing Vectors for Click-Through
Rate prediction
- Byungwoo Jang
- Jinhee Park
- Eunil Park
Previous Click-Through Rate (CTR) prediction models rely on enumerating high-order
feature combinations up to a fixed order, limiting expressiveness and scalability.
Recent studies explored arbitrary-order interaction modeling through two major paradigms:
log-based and graph-based methods. However, both paradigms suffer from inherent weaknesses:
log-based methods lack stability, and graph-based methods lack generalizability, as
both attempt to model overly diverse combinations of features, many of which may be
noisy or redundant. This observation provokes a central question: What if only a small
set of core interactions is sufficient? To explore this, we progressively mask feature
interactions and find that removing up to 90% of them results in negligible performance
degradation. This suggests that most interactions are unnecessary. Motivated by this
finding, we propose SCV: Selectively Crossing Vectors, a CTR prediction framework
that reformulates feature interaction learning as a sparse edge selection task over
a globally shared feature-interaction graph. By modeling feature interactions over
a globally learned graph and dynamically fusing expert outputs in an instance-aware
manner, the SCV effectively leverages global consistency and local adaptability. We
further introduce a label-biased self-distillation objective to mitigate the effects
of noisy supervision and stabilize training. Experiments on public CTR benchmarks
show that SCV achieves state-of-the-art performance while reducing computational cost
by up to 66%, validating the effectiveness of globally sparse yet locally adaptive
interaction modeling. All codes are available at: https://github.com/bw-99/scv.
Leveraging Vulnerabilities in Temporal Graph Neural Networks via Strategic High-Impact
Assaults
- Dong Hyun Jeon
- Lijing Zhu
- Haifang Li
- Pengze Li
- Jingna Feng
- Tiehang Duan
- Houbing Herbert Song
- Cui Tao
- Shuteng Niu
Temporal Graph Neural Networks (TGNNs) have become indispensable for analyzing dynamic
graphs in critical applications such as social networks, communication systems, and
financial networks. However, the robustness of TGNNs against adversarial attacks,
particularly sophisticated attacks that exploit the temporal dimension, remains a
significant challenge. Existing attack methods for Spatio-Temporal Dynamic Graphs
(STDGs) often rely on simplistic, easily detectable perturbations (e.g., random edge
additions/deletions) and fail to strategically target the most influential nodes and
edges for maximum impact. We introduce the High Impact Attack (HIA), a novel restricted
black-box attack framework specifically designed to overcome these limitations and
expose critical vulnerabilities in TGNNs. HIA leverages a data-driven surrogate model
to identify structurally important nodes (central to network connectivity) and dynamically
important nodes (critical for the graph's temporal evolution). It then employs a hybrid
perturbation strategy, combining strategic edge injection (to create misleading connections)
and targeted edge deletion (to disrupt essential pathways), maximizing TGNN performance
degradation. Importantly, HIA minimizes the number of perturbations to enhance stealth,
making it more challenging to detect. Comprehensive experiments on five real-world
datasets and four representative TGNN architectures (TGN, JODIE, DySAT, and TGAT)
demonstrate that HIA significantly reduces TGNN accuracy on the link prediction task,
achieving up to a 35.55% decrease in Mean Reciprocal Rank (MRR) - a substantial improvement
over state-of-the-art baselines. These results highlight fundamental vulnerabilities
in current STDG models and underscore the urgent need for robust defenses that account
for both structural and temporal dynamics. Code and Data are available at https://github.com/ryandhjeon/hia.
Identifying Critical Segments Affecting Piano Performance Evaluation
Effective evaluation of expressive music performance should not only assess technical
and interpretive quality but also support efficient practice by identifying musically
critical segments. However, traditional expert feedback is often limited by accessibility
and scalability. To address this, we investigate how musically critical segments can
be automatically identified and interpreted to support personalized assessment of
piano performance. We propose a framework that identifies expressive deviations in
piano performance by comparing extracted feature values with reference guidelines
derived from music sheet analysis and expert annotations. Critical segments are detected
via SHAP-based feature importance and change-point analysis, while interpretable annotation
is generated using a large language model, conditioned on feature descriptions and
quantified deviations. We evaluate our framework on the PercePiano dataset and newly
collected annotations, showing consistent improvements in predicted overall scores
of the performance after applying the generated annotation. Annotations generated
using a finetuned feature extraction model improved predicted scores by up to 8.76%,
with greater alignment with expert labels in both segment coverage and overlap. SHAP-based
analysis confirms that the model identifies musically important features, enhancing
both interpretability of annotation and its relevance to musical evaluation. Our results
demonstrate that the proposed framework produces interpretable, musically meaningful
annotations aligned with expert evaluations, and can serve as a foundation for scalable,
AI-assisted music education and assessment. Our code is available at: https://github.com/Hyerim-Jeon/critical_segments.
ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting
- Hyotaek Jeon
- Hyunwook Lee
- Juwon Kim
- Sungahn Ko
Traffic forecasting represents a crucial problem within intelligent transportation
systems. In recent research, Large Language Models (LLMs) have emerged as a promising
method, but their intrinsic design, tailored primarily for sequential token processing,
introduces notable challenges in effectively capturing spatial dependencies. Specifically,
the inherent limitations of LLMs in modeling spatial relationships and their architectural
incompatibility with graph-structured spatial data remain largely unaddressed. To
overcome these limitations, we introduce ST-LINK, a novel framework that enhances
the capability of Large Language Models to capture spatio-temporal dependencies. Its
key components are Spatially-Enhanced Attention (SE-Attention) and the Memory Retrieval
Feed-Forward Network (MRFFN). SE-Attention extends rotary position embeddings to integrate
spatial correlations as direct rotational transformations within the attention mechanism.
This approach maximizes spatial learning while preserving the LLM's inherent sequential
processing structure. Meanwhile, MRFFN dynamically retrieves and utilizes key historical
patterns to capture complex temporal dependencies and improve the stability of long-term
forecasting. Comprehensive experiments on benchmark datasets demonstrate that ST-LINK
surpasses conventional deep learning and LLM approaches, and effectively captures
both regular traffic patterns and abrupt changes.
Parameter-Efficient Transfer Learning for EEG Foundation Models via Task-Relevant
Feature Focusing
- Jaehyun Jeon
- Seungwoo Jeong
- Yeajin Shon
- Heung-Il Suk
Electroencephalogram (EEG)-based brain-computer interfaces face challenges in data
insufficiency to train neural networks for tasks and generalizability for multiple
subjects. To address the issue of data scarcity in EEG research, transfer learning
using EEG foundation models (EFMs) has recently gained attention for its ability to
leverage prior knowledge. Although transfer learning with EFMs enables tasks to be
performed with limited training data, their increasing size presents significant computational
challenges. Parameter-efficient transfer learning (PETL) methods address this computational
issue by tuning only a small subset of parameters from the pre-trained model. However,
existing PETL methods mostly fail to account for the high-dimensional nature of EEG
data, which limits their ability to fully leverage the prior knowledge of the EFM
when applied to downstream tasks. To address these challenges, we propose a novel
PETL method with a TASk-relevanT fEature Focusing modULe (TASTEFUL) to transfer EFMs
efficiently. TASTEFUL is designed to focus on task-relevant features and efficiently
learn representations tailored for downstream tasks. We evaluated our proposed TASTEFUL
on tasks using publicly available EEG datasets, demonstrating its superior performance.
Finally, our work highlights TASTEFUL's potential to enhance the practical application
of EFMs, marking a significant advancement in PETL for EFMs.
Entity-Aware Generative Retrieval for Personalized Contexts
- Jihyeong Jeon
- Jiwon Lee
- Cheol Ryu
- U Kang
Given a user query containing ambiguous and user-specific references, how can we effectively
retrieve personalized information? Personalized information retrieval (PIR) requires resolving context-dependent cues
--- such as nicknames, personal locations, or temporal expressions. This poses challenges
for conventional retrievers, including dense and generative models, which often struggle
with entity ambiguity and generalization to user-specific contexts. In this paper,
we propose PEARL (Personalized Entity-Aware Generative RetrievaL), a novel generative retrieval framework for personalized IR. PEARL addresses key challenges through three components: (i) entity-aware annotation with
span-level regularization to reduce lexical sensitivity, (ii) prefix-based contrastive
learning to capture structural alignment between lexically divergent query-passage
pairs, and (iii) context diversification to improve robustness against user-specific
variations. Empirical results on both an existing PIR dataset and our new large-scale
synthetic benchmark PAIR show that PEARL consistently outperforms strong baselines under zero-shot evaluation. Notably, PEARL achieves the state-of-the-art performance in Hits@1 and MRR@10, demonstrating its
effectiveness for retrieval in personalized user contexts. Our dataset is available
at https://www.github.com/pearl-pair/pearl.
Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection
The rapid advancement of generative AI has enabled the mass production of photorealistic
synthetic images, blurring the boundary between authentic and fabricated visual content.
This challenge is particularly evident in deepfake scenarios involving facial manipulation,
but also extends to broader AI-generated content (AIGC) cases involving fully synthesized
scenes. As such content becomes increasingly difficult to distinguish from reality,
the integrity of visual media is under threat. To address this issue, we propose a
physically interpretable deepfake detection framework and demonstrate that defocus
blur can serve as an effective forensic signal. Defocus blur is a depth-dependent
optical phenomenon that naturally occurs in camera-captured images due to lens focus
and scene geometry. In contrast, synthetic images often lack realistic depth-of-field
(DoF) characteristics. To capture these discrepancies, we construct a defocus blur
map and use it as a discriminative feature for detecting manipulated content. Unlike
RGB textures or frequency-domain signals, defocus blur arises universally from optical
imaging principles and encodes physical scene structure. This makes it a robust and
generalizable forensic cue. Our approach is supported by three in-depth feature analyses,
and experimental results confirm that defocus blur provides a reliable and interpretable
cue for identifying synthetic images. We aim for our defocus-based detection pipeline
and interpretability tools to contribute meaningfully to ongoing research in media
forensics. The implementation is publicly available at: https://github.com/irissun9602/Defocus-Deepfake-Detection
Local Large Language Models for Recommendation
- Yujin Jeon
- Jooyoung Kim
- Joonseok Lee
Unlike traditional classification tasks, recommendation is inherently subjective-whether
an item should be suggested depends not only on user preferences and item semantics,
but also on latent behavioral patterns and contextual cues. While recent LLM-based
recommenders excel at modeling semantics and intent through generative reasoning,
they often fail to capture collaborative signals and suffer from inefficiencies when
applied globally across large interaction spaces. We propose Local Large Language
Models for Recommendation(L3Rec), a novel model-agnostic framework that integrates
collaborative filtering(CF) with generative LLMs through localized modeling. Our approach
first applies a light-weight CF model to derive user and item embeddings, then clusters
them into behaviorally coherent subgroups. Each cluster is assigned a dedicated generative
LLM-referred to as a local LLM-trained only on its corresponding data subset. This
enables fine-grained personalization while improving training efficiency through parallelism.
At inference time, predictions from local models are aggregated via a fusion strategy,
with a global CF fallback when needed. To the best of our knowledge, this is the first
LLM-based recommendation framework to incorporate local collaborative structure. Experiments
show that it achieves state-of-the-art performance with significantly better scalability
and efficiency.
Frequency-Conditioned Diffusion Models for Time Series Generation
- Seungwoo Jeong
- Junghyo Sohn
- Jaehyun Jeon
- Heung-Il Suk
Time series data, widely used in fields such as climate studies, finance, and healthcare,
often face scarcity in rare scenarios and privacy concerns, prompting growing interest
in time series synthesis. Diffusion models have shown strong potential for generating
high-quality data, but challenges remain in capturing long-range dependencies and
complex patterns. We propose a novel diffusion model that integrates time-domain information
with rich frequency-domain features, accounting for differences in noise decay rates
across frequencies. Instead of arbitrary frequency splits used in prior works, we
partition components based on spectral density, model them separately within the denoising
backbone, and fuse them with time-domain features. This enables effective capture
of both global and local patterns, enhancing representation of high- and low-frequency
information. Extensive experiments on multiple public datasets show promising performance,
and analyses including long-term generation and ablation studies demonstrate the model's
ability to learn and represent complex time series distributions.
From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into
Large Language Models
- Yanbiao Ji
- Chang Liu
- Xin Chen
- Dan Luo
- Mei Li
- Yue Ding
- Wenqing Lin
- Hongtao Lu
Enabling large language models (LLMs) to effectively process and reason with graph-structured
data remains a significant challenge despite their remarkable success in natural language
tasks. Current approaches either convert graph structures into verbose textual descriptions,
consuming substantial computational resources, or employ complex graph neural networks
as tokenizers, which introduce significant training overhead. To bridge this gap,
we present NT-LLM, a novel framework with an anchor-based positional encoding scheme
for graph representation. Our approach strategically selects reference nodes as anchors
and encodes each node's position relative to these anchors, capturing essential topological
information without the computational burden of existing methods. Notably, we identify
and address a fundamental issue: the inherent misalignment between discrete hop-based
distances in graphs and continuous distances in embedding spaces. By implementing
a rank-preserving objective for positional encoding pretraining, NT-LLM achieves superior
performance across diverse graph tasks ranging from basic structural analysis to complex
reasoning scenarios. Our comprehensive evaluation demonstrates that this lightweight
yet powerful approach effectively enhances LLMs' ability to understand and reason
with graph-structured information, offering an efficient solution for graph-based
applications of language models.
Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge
- Yuhe Ji
- Yilun Liu
- Feiyu Yao
- Minggui He
- Shimin Tao
- Xiaofeng Zhao
- Chang Su
- Xinhua Yang
- Weibin Meng
- Yuming Xie
- Boxing Chen
- Shenglin Zhang
- Yongqian Sun
Log analysis represents a critical sub-domain within AI applications that facilitates
automatic approaches to fault and error management of large-scaled software systems,
saving labors of traditional manual methods. While existing solutions using large
language models (LLMs) show promise, they are limited by a significant domain gap
between natural and log languages (the latter contains rich domain-specific tokens
such as status codes, IP addresses, resource pathes), which restricts their effectiveness
in real-world applications. However, directly adapting general-purpose LLMs to log
analysis using raw logs may degrade their performance due to inconsistent token distribution.
In this paper, we present a domain adaptation approach that addresses these limitations
by integrating interpretable domain knowledge into open-source LLMs through continual
pre-training (CPT), which bridges this domain gap by adapting LLMs on interpretable
natural texts with log knowledge (instead of raw logs) to reduce distribution discrepancy.
To achieve this, we developed NLPLog, a comprehensive dataset containing over 250,000
question-answer pairs on log-related knowledge. Our resulting model, SuperLog, achieves
the best performance across four log analysis tasks, with an average accuracy improvement
of 12.01% over the second-best model. Ablation study also suggests advantages of domain
adaption using interpretable log knowledge over using raw logs.
SELF: Surrogate-light Feature Selection with Large Language Models in Deep Recommender
Systems
- Pengyue Jia
- Zhaocheng Du
- Yichao Wang
- Xiangyu Zhao
- Xiaopeng Li
- Yuhao Wang
- Qidong Liu
- Huifeng Guo
- Ruiming Tang
Feature selection is crucial in recommender systems for improving model efficiency
and predictive performance. Conventional approaches typically employ surrogate models-such
as decision trees or neural networks-to estimate feature importance. However, their
effectiveness is inherently constrained, as these models may struggle under suboptimal
training conditions, including feature collinearity, high-dimensional sparsity, and
insufficient data. In this paper, we propose SELF, a SurrogatE-Light Feature selection
method for deep recommender systems. SELF integrates semantic reasoning from Large
Language Models (LLMs) with task-specific learning from surrogate models, enabling
an automated and lightweight feature selection process. Specifically, LLMs first produce
a semantically informed ranking of feature importance, which is subsequently refined
by a surrogate model, effectively integrating general world knowledge with task-specific
learning. Comprehensive experiments on three public datasets from real-world recommender
platforms validate the effectiveness of SELF. To facilitate reproducibility, our code
is publicly available.
SUMMA: A Multimodal Large Language Model for Advertisement Summarization
- Weitao Jia
- Shuo Yin
- Zhoufutu Wen
- Han Wang
- Zehui Dai
- Kun Zhang
- Zhenyu Li
- Tao Zeng
- Xiaohui Lv
Understanding multimodal video ads is crucial for improving query-ad matching and
relevance ranking on short video platforms, enhancing advertising effectiveness and
user experience. However, the effective utilization of multimodal information with
high commercial value still largely constrained by reliance on highly compressed video
embeddings-has long been inadequate. To address this, we propose SUMMA (the abbreviation of SUmmarizing MultiModalAds), a multimodal model that automatically processes video ads into summaries highlighting
the content of highest commercial value, thus improving their comprehension and ranking
in Douyin search-advertising systems. SUMMA is developed via a two-stage training
strategy-multimodal supervised fine-tuning followed by reinforcement learning with
a mixed reward mechanism-on domain-specific data containing video frames and ASR/OCR
transcripts, generating commercially valuable and explainable summaries. We integrate
SUMMA-generated summaries into our production pipeline, directly enhancing the candidate
retrieval and relevance ranking stages in real search-advertising systems. Both offline
and online experiments show substantial improvements over baselines, with online results
indicating a statistically significant 1.5% increase in advertising revenue. Our work
establishes a novel paradigm for condensing multimodal information into representative
texts, effectively aligning visual ad content with user query intent in retrieval
and recommendation scenarios.
Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis
and Candidate-Conditioned Answering
Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly
in accurate chart description and complex reasoning. Synthetic data generation is
a promising solution, while usually facing the challenge of noise labels. To address
this challenge, we first introduce a chart synthesis pipeline that generates aligned
chart-question-answer triplets through code generation and execution, ensuring the
reliability of synthetic data without human intervention. Furthermore, inspired by
test-time scaling that increases inference budget and thereby improves performance,
we design a candidate-conditioned answering process. The VLM first generates multiple
responses per query, and then synthesizes the final answer by contextualizing these
candidates. Experiments demonstrate significant improvements, with up to 15.50 points
accuracy gain over the initial VLM, in a fully self-improving paradigm without either
human-labeled data or external models.
Community-Aware Social Community Recommendation
- Runhao Jiang
- Renchi Yang
- Wenqing Lin
Social recommendation, which seeks to leverage social ties among users to alleviate
the sparsity issue of user-item interactions, has emerged as a popular technique for
elevating personalized services in recommender systems. Despite being effective, existing
social recommendation models are mainly devised for recommending regular items such
as blogs, images, and products, and largely fail for community recommendations due
to overlooking the unique characteristics of communities. Distinctly, communities
are constituted by individuals, who present high dynamicity and relate to rich structural
patterns in social networks. To our knowledge, limited research has been devoted to
comprehensively exploiting this information for recommending communities.
To bridge this gap, this paper presents CASO, a novel and effective model specially designed for social community recommendation.
Under the hood, CASO harnesses three carefully-crafted encoders for user embedding, wherein two of them
extract community-related global and local structures from the social network via
social modularity maximization and social closeness aggregation, while the third one captures user preferences using collaborative filtering with
observed user-community affiliations. To further eliminate feature redundancy therein,
we introduce a mutual exclusion between social and collaborative signals. Finally,
CASO includes a community detection loss in the model optimization, thereby producing
community-aware embeddings for communities. Our extensive experiments evaluating CASO against nine strong baselines on six real-world social networks demonstrate its consistent
and remarkable superiority over the state of the art in terms of community recommendation
performance.
Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label
Classification
- Ruobing Jiang
- Mengzhe Liu
- Haobing Liu
- Yanwei Yu
Hierarchical Multi-Label Classification (HMC) faces critical challenges in maintaining
structural consistency and balancing loss weighting in Multi-Task Learning (MTL).
In order to address these issues, we propose a classifier called HCAL based on MTL
integrated with prototype contrastive learning and adaptive task-weighting mechanisms.
The most significant advantage of our classifier is semantic consistency including
both prototype with explicitly modeling label and feature aggregation from child classes
to parent classes. The other important advantage is an adaptive loss-weighting mechanism
that dynamically allocates optimization resources by monitoring task-specific convergence
rates. It effectively resolves the ''one-strong-many-weak'' optimization bias inherent
in traditional MTL approaches. To further enhance robustness, a prototype perturbation
mechanism is formulated by injecting controlled noise into prototype to expand decision
boundaries. Additionally, we formalize a quantitative metric called Hierarchical Violation
Rate (HVR) as to evaluate hierarchical consistency and generalization. Extensive experiments
across three datasets demonstrate both the higher classification accuracy and reduced
hierarchical violation rate of the proposed classifier over baseline models.
Exploring the Tradeoff Between Diversity and Discrimination for Continuous Category
Discovery
- Ruobing Jiang
- Yang Liu
- Haobing Liu
- Yanwei Yu
- Chunyang Wang
Continuous category discovery (CCD) aims to automatically discover novel categories
in continuously arriving unlabeled data. This is a challenging problem considering
that there is no number of categories and labels in the newly arrived data, while
also needing to mitigate catastrophic forgetting. Most CCD methods cannot handle the
contradiction between novel class discovery and classification well. They are also
prone to accumulate errors in the process of gradually discovering novel classes.
Moreover, most of them use knowledge distillation and data replay to prevent forgetting,
occupying more storage space. To address these limitations, we propose Independence-based
Diversity and Orthogonality-based Discrimination (IDOD). IDOD mainly includes independent
enrichment of diversity module, joint discovery of novelty module, and continuous
increment by orthogonality module. In independent enrichment, the backbone is trained
separately using contrastive loss to avoid it focusing only on features for classification.
Joint discovery transforms multi-stage novel class discovery into single-stage, reducing
error accumulation impact. Continuous increment by orthogonality module generates
mutually orthogonal prototypes for classification and prevents forgetting with lower
space overhead via representative representation replay. Experimental results show
that on challenging fine-grained datasets, our method outperforms the state-of-the-art
methods.
Balance and Brighten: A Twin-Propeller Network to Release Potential of Physics Laws
for Traffic State Estimation
- Weihao Jiang
- Yao Fu
- Hong Zhao
- Xiaoyu Cai
- Ruiheng Yang
- Linsen Li
- Jiang Zhu
Traditional physics-informed deep learning combines the data-driven methods with the
model-based methods by incorporating physics loss as a constraint in total loss function,
which aims to enforce the neural network to behave according to the physics laws.
However, the potential of physical knowledge is severely underestimated by this approach.
Firstly, the physical knowledge fails to demonstrate its intended effects since the
physics loss could have extremely small magnitude, more fluctuating convergence rates,
and conflicting directions of the gradients compared to the data loss. Secondly, existing
methods implicitly employ physics laws as auxiliary terms, which ignores that explicitly
utilizing certain properties of physics laws can compensate for the shortcomings of
data-driven models, particularly with regard to the data noise and relationships between
variables. To alleviate these issues, we propose a Twin-Propeller Network (TPN) to realize fully message exchange among physical knowledge and data information,
that releases the potential of the physics laws. Practically, we independently train
data-driven model and physics-based model as two student models to get the information
separated. Considering the measurement noise present in the data-driven model and
the relatively robust physics-based model, we quantify the data uncertainty and utilize
it as a weight to balance the two students in a integrated robust teacher model. The
stronger teacher in turn transfers the respective knowledge to another student, where
we innovatively propose traffic state relation distillation and physical knowledge
distillation to guide the training of the data student and the physics student respectively.
Through extensive experiments on both synthetic and real-world datasets, our model
demonstrates better performance than the existing state-of-the-art methods.
Dangerous Language Habits! Exploiting Code-Mixing for Backdoor Attacks on NLP Models
- Haotian Jin
- Haihui Fan
- Jinchao Zhang
- Yang Li
- Bo Li
- Junhao Zhou
Backdoor attacks threaten the reliability of NLP models by embedding hidden behaviors
during training, which are activated by specific inputs at inference time. Traditional
backdoor triggers often rely on explicit content alterations-such as token insertion
or stylistic modification-which may compromise semantic coherence and be easily detected.In
this work, we propose a novel backdoor attack strategy that leverages the linguistic
properties of code-mixing(a language form that combines elements from two or more
languages) as implicit triggers. Drawing inspiration from natural code-mixing communication,
we design three types of linguistically grounded triggers: inter-word mixing , intra-sentential
mixing , and inter-sentential mixing. These forms reflect realistic language usage
patterns in bilingual communities, enhancing the stealthiness of the attack. The experiment
results show that existing NLP models perform poorly when faced with backdoor attacks
based on code-mixing triggers. We are the first to focus on code-mixing as a trigger
for text backdoor attacks. We hope this research raises awareness of the vulnerability
of models during training when faced with code-mixing.
Point-DMAE: Point Cloud Self-supervised Learning via Density-directed Masked Autoencoders
- Xianglong Jin
- Zheng Wang
- Wenjie Zheng
- Feiping Nie
Masked autoencoders have been extensively utilized in 3D point cloud self-supervised
learning, where the fundamental approach involves masking a portion of the point cloud
and subsequently reconstructing it. This process is hypothesized to enhance model
learning by leveraging the inherent structure of the point cloud data. However, the
information density within point clouds is inherently uneven, contrasting with the
more uniform distributions found in language and 2D image data. This uneven distribution
suggests that the application of random masking strategies, commonly adopted from
NLP and 2D vision, may not be optimal for point cloud data, potentially leading to
suboptimal learning outcomes. Based on this observation, we propose a simple yet effective
Density-directed Masked Autoencoders for Point Cloud Self-supervised Learning (Point-DMAE),
which learns latent semantic point cloud features using a density-directed masking
strategy. Specifically, our method employs a dual-branch Transformer architecture
to extract both high-level and fine-grained point features through global and local
block density-directed masking, respectively. Point-DMAE demonstrates high pre-training
efficiency and significantly outperforms our baseline (Point-MAE) on 3D object classification
tasks within the ScanObjectNN dataset by 4.13% on OBJ-BG, 5.17% on OBJ-ONLY, and 4.17%
on PB-T50-RS. Codes are available at https://github.com/jinxianglong10/Point-DMAE.
Enhancing Information Diffusion Prediction via Multiple Granularity Hypergraphs and
Position-aware Sequence Model
- Weikai Jing
- Yuchen Wang
- Haotong Du
- Songxin Wang
- Xiaoyu Li
- Chao Gao
With the rise of social media, accurately predicting information diffusion has become
crucial for a wide range of applications. Existing methods usually employ sequential
hypergraphs to model users' latent interaction preferences and use self-attention
mechanisms to capture dependencies with users. However, they typically focus on a
single temporal scale and lack the ability to effectively model temporal influence,
which limits their performance in diffusion prediction tasks. To address these limitations,
we propose a novel method (MHPS) to enhance information diffusion prediction via multiple
granularity hypergraphs and a position-aware sequence model. Specifically, MHPS constructs
hypergraph sequences of different granularities by grouping user interactions according
to various time intervals. Additionally, to further enhance the modeling of temporal
influence, two types of cross-attention mechanisms, namely next-step positional cross-attention
and source influence cross-attention, are introduced within the cascade representation.
The next-step positional cross-attention captures target position awareness, while
the source influence cross-attention focuses on the impact of the initial source.
Then, gating mechanisms and GRUs are employed to fuse the different attention outputs
and predict the next target user. Extensive experiments on real-world datasets demonstrate
that MHPS achieves competitive performance against state-of-the-art methods. The average
improvements are up to 7.82% in terms of Hits@10 and 5.60% in terms of MAP@100. Our
code is available at https://github.com/cgao-comp/MHPS.
Rethinking the Training Paradigm of Discrete Token-Based Multimodal LLMs: An Analysis
of Text-Centric Bias
- Wansik Jo
- Jooyeong Na
- Soyeon Hong
- Seungtaek Choi
- Hyunsouk Cho
Discrete token-based multimodal large language models (MLLMs), such as AnyGPT and
MIO, integrate diverse modalities into an autoregressive framework by discretizing
modality inputs into tokens compatible with language models. Unlike encoder-based
approaches, such as LLaVA and Flamingo, which utilize pretrained modality-specific
encoders, discrete token-based MLLMs simultaneously learn modality token representations
and their alignment with the language, yet are exclusively trained on modality-text
paired datasets without additional unimodal training. We identify a structural limitation
inherent in this training paradigm, termed text-centric bias, defined as an over-reliance
on the textual context that restricts intrinsic modality understanding. To systematically
analyze the existence of this bias, we propose an analytical framework involving external
perplexity-based and internal neuron-level analyses. Furthermore, to verify whether
the bias originates from the paired-only training paradigm, we introduce an analytical
methodology named Monotune, which is a simple unimodal training stage. Our analyses demonstrate that minimal
exposure to unimodal data effectively mitigates text-centric bias, providing empirical
evidence that the bias is fundamentally induced by the paired-only training strategy.
Through comprehensive downstream task evaluations, we further reveal that this structural
bias meaningfully affects real-world multimodal task performance, particularly under
limited textual contexts. Our findings highlight a fundamental limitation in current
discrete token-based MLLM training paradigms and suggest directions for future multimodal
training strategies. Our code and experiments are available at https://github.com/41312432/Monotune
Generalizing Query Performance Prediction under Retriever and Concept Shifts via Data-driven
Correction
- Jaehwan Jung
- Jong-June Jeon
Query Performance Prediction (QPP) aims to estimate the effectiveness of an information
retrieval (IR) system without access to ground-truth relevance judgments. Existing
supervised QPP methods typically follow a regression model framework that maps query-document
representations to target metrics such as RR@10 or nDCG@10. However, these approaches often suffer from degraded performance under concept
shift, where the distribution of relevance given a query-document pair changes between
training and test datasets. This paper proposes a novel classification-based framework,
QPP-MLC (QPP Multi-Label Classification), which formulates QPP as a multi-label classification
task. QPP-MLC infers the relevance of each document among the top-k retrieved results and aggregates these document-level relevance predictions to predict
the overall query performance. As a result, QPP-MLC provides a diagnosis tool for
the concept shift and a correction method under the concept shift by modulating a
threshold level of classification tasks. Experiments on MS MARCO and TREC DL benchmarks
show that QPP-MLC achieves strong prediction accuracy and outperforms traditional
regression-based QPP methods.
Efficiency Boost in Decentralized Optimization: Reimagining Neighborhood Aggregation
with Minimal Overhead
- Durgesh Kalwar
- Mayank Baranwal
- Harshad Khadilkar
In today's data-sensitive landscape, distributed learning emerges as a vital tool,
not only fortifying privacy measures but also streamlining computational operations.
This becomes especially crucial within fully decentralized infrastructures where local
processing is imperative due to the absence of centralized aggregation. Here, we introduce
DYNAWEIGHT, a novel framework to information aggregation in multi-agent networks.
DYNAWEIGHT offers substantial acceleration in decentralized learning with minimal
additional communication and memory overhead. Unlike traditional static weight assignments,
such as Metropolis weights, DYNAWEIGHT dynamically allocates weights to neighboring
servers based on their relative losses on local datasets. Consequently, it favors
servers possessing diverse information, particularly in scenarios of substantial data
heterogeneity. Our experiments on various datasets MNIST, CIFAR10, and CIFAR100 incorporating
various server counts and graph topologies, demonstrate notable enhancements in training
speeds. Notably, DYNAWEIGHT functions as an aggregation scheme compatible with any
underlying server-level optimization algorithm, underscoring its versatility and potential
for widespread integration.
Causal Effect Variational Transformer for Public Health Measures and COVID-19 Infection
Cluster Analysis
- Jinho Kang
- Sungjun Lim
- Hojun Park
- Jiyoung Jung
- Jaehun Jung
- Kyungwoo Song
Recent research increasingly integrates causal inference into deep learning models
to enhance the explainability and robustness of medical applications. However, data
scarcity remains a fundamental challenge due to privacy constraints and the high cost
of data collection. This issue, compounded by complex variable dependencies and unobserved
latent confounders, hinders the reliable estimation of causal effects. To address
these challenges, we collect two real-world COVID-19 infection cluster datasets, including
public health measures, from distinct distributions in collaboration with local governments,
a medical university, and a hospital. We also propose a cut-off augmentation method
that generates diverse feature-label pairs by slicing time-series sequences at different
observation windows, effectively simulating partial observations common in real-world
settings. We further introduce the Causal Effect Variational Transformer (CEVT), a
Transformer-based model that captures temporal structure and addresses the difficulty
of causal estimation under scarce data, complex dependencies, and latent confounding
by modeling multiple treatments through an iterative conditioning mechanism. We validate
the causal modeling capability of CEVT on synthetic datasets and demonstrate that,
on two distinct COVID-19 datasets, it consistently outperforms baselines in infection
prediction. Notably, the causal effects estimated by CEVT converge with findings from
medical studies on infection control, reinforcing its reliability and underscoring
its potential to inform public health decision-making.
Curriculum Guided Personalized Subgraph Federated Learning
Subgraph Federated Learning (FL) aims to train Graph Neural Networks (GNNs) across
distributed private subgraphs, but it suffers from severe data heterogeneity. To mitigate
data heterogeneity, weighted model aggregation methods personalize each local GNN
by assigning larger weights to parameters from clients with similar subgraph characteristics
inferred from their current model states. However, the sparse and biased subgraphs
often trigger rapid overfitting, causing the estimated client similarity matrix to
stagnate or even collapse. As a result, the aggregation loses effectiveness as clients
reinforce their own biases instead of exploiting diverse knowledge otherwise available.
To this end, we propose a novel personalized subgraph FL framework called Curriculum Guided Personalized SUbgraph Federated Learning (CUFL). On the client side, CUFL adopts Curriculum Learning (CL) that adaptively selects edges for training according to their reconstruction
scores, exposing each GNN first to easier, generic cross-client substructures and
only later to harder, client-specific ones. This paced exposure prevents early overfitting
to biased patterns and enables gradual personalization. By regulating personalization,
the curriculum also reshapes server aggregation from exchanging generic knowledge
to propagating client-specific knowledge. Further, CUFL improves weighted model aggregation
by estimating client similarity using fine-grained structural indicators reconstructed
on a random reference graph. Extensive experiments on six benchmark datasets confirm
that CUFL achieves superior performance compared to relevant baselines. Code is available
at https://github.com/Kang-Min-Ku/CUFL.git.
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and
Expert-Level LLM-as-a-Judge
- Heegyu Kim
- Taeyang Jeon
- Seungtaek Choi
- Ji Hoon Hong
- Dong Won Jeon
- Ga-Yeon Baek
- Gyeong-Won Kwak
- Dong-Hee Lee
- Jisu Bae
- Chihoon Lee
- Yoon-Seo Kim
- Seon-Jin Choi
- Jin-Seong Park
- Sung Beom Cho
- Hyunsouk Cho
Materials synthesis remains a critical bottleneck in developing innovations for energy
storage, catalysis, electronics, and biomedical devices. Current synthesis design
relies heavily on empirical trial-and-error methods guided by expert intuition, limiting
the pace of materials discovery. To address this challenge, we present AlchemyBench,
a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis
recipes from open-access literature.
AlchemyBench provides an end-to-end framework that supports research in large language
models (LLMs) applied to materials synthesis prediction. The benchmark encompasses
four key tasks: raw materials and equipment prediction, synthesis procedure generation,
and characterization outcome forecasting. To enable scalable evaluation, we propose
an LLM-as-a-Judge framework that leverages large language models for automated assessment,
demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78).
Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o)
achieve scores around 4.0 on well-documented oxide and organic synthesis targets,
but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning
on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines
trained on 1M samples, while retrieval-augmented generation provides an additional
+0.20 improvement when supplied with five high-similarity contexts.
AlchemyBench addresses a critical gap in the field by providing the first comprehensive,
legally redistributable benchmark for automated materials synthesis prediction. Our
contributions establish a foundation for exploring LLM capabilities in predicting
and guiding materials synthesis, ultimately accelerating experimental design and innovation
in materials science.
Self-supervised Dual-view Framework with Tailored Negative Sampling for New Activity
Detection
With the growing adoption of IoT, analyzing multi-modal sensor time series for human
activity recognition has become crucial for intelligent and context-aware applications.
While existing approaches assume a fixed set of known activities, real-world deployments
encounter new activities not present in the training data. Detecting them is challenging
due to overlapping patterns between known and new activities, high intra-class variability,
and sensor heterogeneity across datasets. To address these challenges, we propose
CLAN, a novel self-supervised dual-view framework for new activity detection. It employs
a two-tower architecture that extracts discriminative representations from both time
and frequency domains. By treating multiple strongly augmented views of known activity
samples as negatives, the model learns invariant representations that effectively
distinguish new activities. In addition, a dataset-aware augmentation selection mechanism
adaptively determines transformations tailored to each dataset's characteristics,
thereby enhancing generalization across diverse sensor environments. Extensive experiments
on five real-world human activity datasets demonstrate that CLAN consistently outperforms
new activity detection baselines, achieving up to 9.24% improvement in AUROC.
Exploring Diverse Sparse Network Structures via Dynamic Pruning with Weight Alignment
- Jinwoo Kim
- Jongyun Shin
- Sangho An
- Jangho Kim
Deep neural networks (DNNs) often require a large number of parameters, which has
led to the development of model pruning techniques that remove weight connections.
In this paper, we propose a new method to maximize the effect of finding sparse patterns
through a gradient scaling technique that modifies the weight distribution. This approach
allows for the exploration of more diverse sparse patterns compared to traditional
dynamic pruning methods, leading to the discovery of stable subnetworks. Through various
experiments, we demonstrate the importance of exploration in finding better sparse
patterns. We achieve state-of-the-art performance across multiple network architectures
on datasets such as CIFAR-10/100 and ImageNet. The code is available at https://github.com/Acasia/DWA.
Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning
- Jongwoo Kim
- Seongyeub Chu
- Hyeongmin Park
- Bryan Wong
- Keejun Han
- Mun Yong Yi
Recent advancements in heterogeneous GNNs have enabled significant progress in embedding
nodes and learning relationships across diverse tasks. However, traditional methods
rely heavily on meta-paths grounded in node types, which often fail to encapsulate
the full complexity of node interactions, leading to inconsistent performance and
elevated computational demands. To address these challenges, we introduce MF2Vec,
a novel framework that shifts focus from rigid node-type dependencies to dynamically
exploring shared facets across nodes, regardless of type. MF2Vec constructs multi-faceted
paths and forms homogeneous networks to learn node embeddings more effectively. Through
extensive experiments, we demonstrate that MF2Vec achieves superior performance in
node classification, link prediction, and node clustering tasks, surpassing existing
baselines. Furthermore, it exhibits reduced performance variability due to meta-path
dependencies and achieves faster training convergence. These results highlight its
capability to analyze complex networks comprehensively. The implementation of MF2Vec
is publicly available at https://github.com/kimjongwoo-cell/MF2Vec.
From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting
in Noisy Financial Markets
- Juwon Kim
- Hyunwook Lee
- Hyotaek Jeon
- Seungmin Jin
- Sungahn Ko
Directional forecasting in financial markets requires both accuracy and interpretability.
Before the advent of deep learning, interpretable approaches based on human-defined
patterns were prevalent, but their structural vagueness and scale ambiguity hindered
generalization. In contrast, deep learning models can effectively capture complex
dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage
framework that integrates unsupervised pattern extracion with interpretable forecasting.
(i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns
that are invariant to amplitude scaling and temporal distortion, even under varying
window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part
of extracted patterns as input and forecasts subsequent partial sequences for short-term
directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate
that our method ranks first or second in 11 out of 12 metric--dataset combinations,
consistently outperforming baselines. Unlike conventional deep learning models that
output buy-or-sell signals without interpretable justification, our approach enables
transparent decision-making by revealing the underlying pattern structures that drive
predictive outcomes.
A Self-Supervised Mixture-of-Experts Framework for Multi-behavior Recommendation
- Kyungho Kim
- Sunwoo Kim
- Geon Lee
- Kijung Shin
In e-commerce, where users face a vast array of possible item choices, recommender
systems are vital for helping them discover suitable items they might otherwise overlook.
While many recommender systems primarily rely on a user's purchase history, recent
multi-behavior recommender systems incorporate various auxiliary user behaviors, such as item clicks and cart additions,
to enhance recommendations. Despite their overall performance gains, their effectiveness
varies considerably between visited items (i.e., those a user has interacted with through auxiliary behaviors) and unvisited items (i.e., those with which the user has had no such interactions). Specifically, our
analysis reveals that (1) existing multi-behavior recommender systems exhibit a significant
gap in recommendation quality between the two item types (visited and unvisited items)
and (2) achieving strong performance on both types with a single model architecture
remains hallenging. To tackle these issues, we propose a novel multi-behavior recommender
system, MEMBER. It employs a mixture-of-experts framework, with experts designed to
recommend the two item types, respectively. Each expert is trained using a self-supervised
method specialized for its design goal. In our comprehensive experiments, we show
the effectiveness of MEMBER across both item types, achieving up to 65.46% performance
gain over the best competitor in terms of Hit Ratio@20.
From Menus to the Interactive Food-Ordering Systems
- Min-Ji Kim
- Seong-Jin Park
- Jaehwan Ha
- Ju-Won Seo
- Dinara Aliyeva
- Kang-Min Kim
Conversational interfaces have emerged as an accessible and user-friendly alternative
to traditional touch-based self-service kiosks in food-ordering systems. Despite their
promise, building such systems remains challenging due to the need for costly data
annotation, store-specific model adaptation, and scalable deployment. In this study,
we propose a fully automated, end-to-end framework that transforms structured menu
databases into high-quality annotated datasets and efficiently deploys store-specific
conversational models using a parameter-efficient fine-tuning method. Our approach
fine-tunes only 0.9% of the backbone model parameters per store, enabling cost-effective
and plug-and-play deployment across diverse environments. To enhance robustness, we
further integrate a recommendation module that suggests alternative items when requested
menu options are unavailable. Experimental results on data from 27 stores in South
Korea demonstrate that our framework consistently outperforms existing data generation
baselines in intent classification and slot filling performance, while maintaining
high annotation quality. Simulated real-world voice-ordering scenarios confirm the
practicality of our framework for rapid, scalable, and accessible deployment in real-world
environments.
OASIS: Open-world Adaptive Self-supervised and Imbalanced-aware System
- Miru Kim
- Mugon Joe
- Minhae Kwon
The expansion of machine learning into dynamic environments presents challenges in
handling open-world problems where label shift, covariate shift, and unknown classes
emerge concurrently. Post-training methods have been explored to address these challenges,
adapting models to newly emerging data. However, these methods struggle when the initial
pre-training is performed on class-imbalanced datasets, limiting generalization to
minority classes. To address this, we propose OASIS, an Open-world Adaptive Self-supervised and Imbalanced-aware System. OASIS consists of two learning phases: pre-training and post-training. The
pre-training phase aims to improve the classification performance of samples near
class boundaries via a novel borderline sample refinement step. Notably, the borderline
sample refinement step critically improves the robustness of the decision boundary
in the representation space. Through this robustness of the pre-trained model, OASIS
generates reliable pseudo-labels, adapting the model against open-world problems in
the post-training phase. Extensive experiments demonstrate that our method significantly
outperforms state-of-the-art post-training techniques in both accuracy and efficiency
across diverse open-world scenarios.
Online Activation Value-aware Clustering and Aggregation for Faithful Argumentative
Explanations
- Ungsik Kim
- Jiho Bae
- Sang-Min Choi
- Suwon Lee
Argumentative explainable artificial intelligence employs argumentation theory to
explain the mechanisms of machine learning. Previous approaches for explaining deep
learning models collectively compressed layers via clustering. However, this resulted
in accumulated information loss across layers, thereby degrading the fidelity of explanations.
We propose online activation value-aware clustering and aggregation, a compression
algorithm that preserves the inference structure of the original neural network with
greater fidelity. The proposed method sequentially compresses each layer, immediately
recalculates activation values following compression, and rectifies inter-layer information
loss using a singular-value-scaled ridge alignment approach. To evaluate the effectiveness
of the proposed method, we introduce four novel quantitative metrics. Input-output
fidelity and structural fidelity measure how accurately the compressed model preserves
the original model predictions and internal activations. Input-output perturbation
consistency and structural perturbation consistency assess the similarity of the changes
induced by Gaussian-perturbed input data. Experiments on three benchmark datasets
(Breast Cancer, California Housing, and HIGGS) demonstrate that our method achieves
performance improvements ranging from 12.9% to 53.7% across the four metrics, demonstrating
significantly higher explanation fidelity than existing approaches.
Where Does Legal AI Fail? Evaluating RAG Pipelines
Retrieval Augmented Generation (RAG) frameworks have been integrated into LLMs for
QA tasks, which have enabled lower levels of hallucination during the generation stage.
Despite ample research in evaluating and detecting hallucination, research regarding
its source in the legal domain is still minimal. In this paper, we break down each
component of the RAG framework and analyze how they affect the quality of the generation.
We use the expert answers of 1,039 real Korean civil complaints provided by the Korean
Ministry of Economy and Finance to analyze the quality of generation across 12 experimental
configurations that vary in retrieval, reranking, and generation components. To evaluate
generation quality, we employ LLM-as-Judge comparing model outputs against expert
responses across three dimensions: conclusion consistency, correctness and incorrectness.
We validate our findings through logistic regression analysis, establishing statistical
relationships between component metrics (Recall@200, NDCG@20) and generation consistency.
Despite differing performance across models, two patterns emerge from the generation
evaluation: increased retrieval performance correlates with decreased generation quality,
and increased reranker performance correlates with higher generation quality. Our
work questions the conventional wisdom that ''more context is better'' and provides
a systematic attribution framework for finding the origin of hallucination in legal
RAG systems.
Constructing Set-Compositional and Negated Representations for First-Stage Ranking
- Antonios Minas Krasakis
- Andrew Yates
- Evangelos Kanoulas
Set compositional and negated queries are crucial for expressing complex information
needs and enable the discovery of niche items like ''Books about non-European monarchs''.
Despite the recent advances in LLMs, first-stage ranking remains challenging due to
the requirement of encoding documents and queries independently from each other. This
limitation calls for constructing compositional query representations that encapsulate
logical operations or negations, and can be used to match relevant documents effectively.
In the first part of this work, we explore constructing such representations in a
zero-shot setting using vector operations between lexically grounded Learned Sparse
Retrieval (LSR) representations. Specifically, we introduce Disentangled Negation that penalizes only the negated parts of a query, and a Combined Pseudo-Term approach that enhances LSR's ability to handle intersections. We find that our zero-shot
approach is competitive and often outperforms retrievers fine-tuned on compositional
data, highlighting certain limitations of LSR and Dense Retrievers. Finally, we address
some of these limitations and improve LSR's representation power for negation, by
allowing them to attribute negative term scores and effectively penalize documents
containing the negated terms.
FairRegBoost: An End-to-End Data Processing Framework for Fair and Scalable Regression
- Nico Lässig
- Melanie Herschel
Fairness-aware machine learning has gained significant attention due to the growing
demand for ethical decision-support systems. This paper introduces FairRegBoost, a novel fairness-aware regression framework that takes a holistic data management
perspective by integrating automated data preparation, uncertainty modeling, and post-processing
adjustments using optimal transport techniques into effective and efficient solutions.
Our approach effectively balances predictive accuracy and fairness by minimizing the
output distribution distance between protected groups, leveraging uncertainty and
sample similarities guiding the transport. We conduct extensive experiments on real-world
datasets with both single and multiple protected attributes. Results demonstrate that
FairRegBoost consistently achieves superior fairness-accuracy trade-offs compared to state-of-the-art
approaches. Moreover, our scalability analysis highlights the computational efficiency,
making it a practical choice for large-scale applications.
D-HAT: Dynamic Hypergraph Representation Learning with Attention-Based Multi-Level
Hypergraph Sampling
- Ah-hyun Lee
- Gordon Euhyun Moon
Hypergraph Neural Networks (HNNs) leverage higher-order interactions in graph-structured
data to enable effective representation learning across a wide range of applications.
Due to the issue of incorporating irrelevant relations in full hypergraphs, it is
crucial to adopt a hypergraph sampling method that efficiently captures substructures
while preserving representational quality. However, existing hypergraph sampling methods
that target only nodes or hyperedges suffer from subgraph disconnection issues and
neglect of node importance due to the randomness in sampling and use of static computational
sub-hypergraphs. In this paper, we propose D-HAT, a hypergraph learning framework
that dynamically constructs representative sub-hypergraphs through a novel attention-based
multi-level hypergraph sampling strategy during the training of HNNs. To prioritize
informative neighbors and enhance the representational quality of sub-hypergraphs
during training, we develop a new attention-based HNNs incorporating attention-guided
aggregation and dense skip connections. To the best of our knowledge, this paper is
the first to quantitatively compare various hypergraph sampling methods for hypergraph
representation learning. Experiments on real-world graph datasets demonstrate the
effectiveness of D-HAT, which consistently achieves higher accuracy compared to existing
hypergraph sampling methods.
Advanced Privacy Protection in Federated Learning using Server-initiated Homomorphic
Encryption
- Cameron Lee
- Matthew L. Daggitt
- Yansong Gao
- Jin B. Hong
Federated learning (FL) has been widely adopted to provide machine learning (ML) privacy,
protecting sensitive user data from leakage. However, there are still attacks that
could exploit FL to access users' sensitive data, such as model inversion attacks,
property inference attacks, and membership inference attacks. Various solutions were
proposed to secure FL using various privacy-preserving techniques, such as differential
privacy, homomorphic encryption, and multi-party encryption. However, existing solutions
often add noise to the model that hinders the accuracy, or introduce large computational
overhead that makes them impractical to use. In this paper, we propose a new privacy
protection scheme for FL that uses homomorphic encryption (HE), noise, and secret
sharing to protect users' sensitive data from up to n-2 adversarial clients and the
server colluding. The computational overhead is minimised by transferring expensive
computations of HE to the server, requiring only the encryption and homomorphic addition
to be carried out by clients. We provide proof sketches to validate the security of
our scheme, and experimental results to demonstrate the practicality of our proposed
scheme. The results show that our scheme adds only up to 8% overhead without losing
any accuracy to base FL models, showing minimal overhead without losing accuracy,
regardless of the data used.
Amortized Baseline Selection via Rank-Revealing QR for Efficient Model Explanation
- Chanwoo Lee
- Youngjin Park
- Hyeongeun Lee
- Yeeun Yoo
- Daehee Han
- Junho Choi
- Geonhyeong Kim
- Nari Kim
- Jaesik Choi
Model-agnostic explanation methods are essential for interpreting machine learning
models, but suffer from prohibitive computational costs that scale with the number
of baselines. Existing acceleration approaches either lack a theoretical base or provide
no principled guidance for baseline selection. To address this gap, we present ABSQR
(Amortized Baseline Selection via Rank-Revealing QR). This framework exploits the
low-rank structure of value matrices to accelerate multi-baseline attribution methods.
Our approach combines deterministic baseline selection via SVD-guided QR decomposition
with an amortized inference mechanism that utilizes cluster-based retrieval. We reduce
computational complexity from O (m • 2d) to O (k • 2d), where k ≪ m. Experiments demonstrate that ABSQR achieves a 91.2% agreement rate
with full baseline methods while providing 8.5× speedup across diverse datasets. As
the first acceleration approach that preserves explanation error guarantees under
computational speedup, ABSQR makes the practical deployment of interpretable AI systems
feasible at scale.
Mitigating Distribution Shift in Stock Price Data via Return-Volatility Normalization
for Accurate Prediction
- Hyunwoo Lee
- Jihyeong Jeon
- Jaemin Hong
- U Kang
How can we address distribution shifts in stock price data to improve stock price
prediction accuracy? Stock price prediction has attracted attention from both academia
and industry, driven by its potential to uncover complex market patterns and enhance
decisionmaking. However, existing methods often fail to handle distribution shifts
effectively, focusing on scaling or representation adaptation without fully addressing
distributional discrepancies and shape misalignments between training and test data.
We propose ReVol (Return-Volatility Normalization for Mitigating Distribution Shift
in Stock Price Data), a robust method for stock price prediction that explicitly addresses
the distribution shift problem. ReVol leverages three key strategies to mitigate these
shifts: (1) normalizing price features to remove sample-specific characteristics,
including return, volatility, and price scale, (2) employing an attention-based module
to estimate these characteristics accurately, thereby reducing the influence of market
anomalies, and (3) reintegrating the sample characteristics into the predictive process,
restoring the traits lost during normalization. Additionally, ReVol combines geometric
Brownian motion for long-term trend modeling with neural networks for short-term pattern
recognition, unifying their complementary strengths. Extensive experiments on real-world
datasets demonstrate that ReVol enhances the performance of the state-of-the-art backbone
models in most cases, achieving an average improvement of more than 0.03 in IC and
over 0.7 in SR across various settings.
Context-aware Sequential Bundle Recommendation via User-specific Representations
How can we recommend bundles that reflect users' changing preferences over time? Sequential
bundle recommendation aims to recommend bundles of items while capturing users' evolving
preferences over time. Unlike traditional bundle or sequential recommendation, this
task requires modeling both the structural composition of bundles and the temporal
dynamics of user behavior. We identify three major challenges: (1) dynamic user preferences
across bundle interactions, (2) user-dependent attention to different items in the
same bundle, and (3) users' diverse preferences on bundling strategies. To address
these, we propose CoReSBR (Contextualized Representation for Sequential Bundle Recommendation),
an adaptive framework that constructs bundle representations contextualized by time-aware
user preferences. CoReSBR encodes recent user interactions to reflect preference shifts,
assigns attention-based weights on items in bundles using user embedding as query,
and integrates multiple bundling strategies through user-specific combination. Extensive
experiments on real-world datasets demonstrate that CoReSBR outperforms the state-of-the-art
methods, achieving up to 8.91% higher nDCG and 8.05% higher recall.
Learning Graph Edit Distance via Node Matching Patterns
Graph edit distance (GED) is a general and versatile measure of graph similarity.
Many combinatorial algorithms have been proposed for computing exact GED, but they
suffer from the high computational cost due to the NP-hardness of GED computation.
To address this challenge, approximate GED computation techniques have been extensively
studied. These techniques are generally twofold: early work is based on combinatorial
algorithms that restrict search space for efficient computation, while more recent
approaches employ machine learning techniques to estimate GED. Although learning-based
approaches generally achieve higher estimation accuracy than combinatorial approximations,
they often rely on smoothed node embeddings to model node-to-node interactions, which
may limit their ability to capture fine-grained structural differences. To alleviate
this limitation, we exploit the insight that sequential variations in node interactions
across GNN layers exhibit informative patterns. In this paper, we design a novel neural
model, Grasp, that learns to extract and leverage these patterns to predict pairwise
node matching probabilities and their associated costs. By aggregating these estimates
from the perspectives of both input graphs, Grasp effectively and accurately computes
an approximate GED. Experimental results on real-world datasets demonstrate that Grasp
significantly improves estimation accuracy over existing approximate methods.
SwaGNER: Leveraging Span-aware Grid Transformers for Accurate Nested Named Entity
Recognition
- Seungjoo Lee
- Yong-chan Park
- U Kang
How can we accurately recognize overlapping entity spans in text while effectively
capturing global context among spans? Nested Named Entity Recognition (nested NER)
becomes challenging in the presence of nested or overlapping entity spans. Traditional
span-based methods enumerate all possible spans, resulting in high computational costs
and severe label imbalance from excessive negative spans which are non-entities. Additionally,
they often fail to fully capture global context among overlapping entities.
In this paper, we propose SwaGNER, a novel approach that dynamically selects span candidates via boundary detection,
and encodes their interactions using a contextual grid. By filtering out low-confidence
spans early, SwaGNER focuses on high-quality candidates, arranging them in a grid to explicitly model
global context and span relationships. This integrated boundary detection and span
classification approach reduces error propagation and effectively leverages sentence-wide
context. Experiments demonstrate that SwaGNER achieves the state-of-the-art performance on both nested NER and flat NER.
Pedagogy-R1: Pedagogical Large Reasoning Model and Well-balanced Educational Benchmark
- Unggi Lee
- Jaeyong Lee
- Jiyeong Bae
- Yeil Jeong
- Junbo Koh
- Gyeonggeon 'Boaz' Lee
- Gunho Lee
- Taekyung Ahn
- Hyeoncheol Kim
Recent advances in large reasoning models (LRMs) have demonstrated impressive capabilities
in highly structured domains such as mathematics and programming. However, their application
to education-where effective reasoning must be pedagogically meaningful, context-sensitive,
and responsive to real student needs-remains relatively unexplored. Existing large
language models (LLMs) often struggle to deliver instructional coherence, formative
feedback, or simulate sophisticated teacher decision-making, limiting their practical
utility in educational settings. To fill this gap, we present Pedagogy-R1, a comprehensive
pedagogical reasoning framework designed to adapt LLMs for authentic classroom tasks.
Our approach features three key innovations: (1) a distillation-based training pipeline
that uses pedagogically filtered outputs for instruction tuning, (2) the Well-balanced
Educational Benchmark (WBEB), which systematically evaluates models across five dimensions-subject
knowledge, pedagogical knowledge, knowledge tracing, essay scoring, and real-world
teacher decision-making-and (3) the Chain-of-Pedagogy (CoP) prompting strategy, employed
both to generate pedagogically enriched training data and to elicit teacher-like reasoning
during inference. We conduct a mixed-methods evaluation, combining fine-grained quantitative
analyses of model performance with qualitative insights into the model's pedagogical
reasoning patterns.
BOVIS: Bias-Mitigated Object-Enhanced Visual Emotion Analysis
- Yubeen Lee
- Sangeun Lee
- Junyeop Cha
- Jufeng Yang
- Eunil Park
Visual emotion analysis is a promising field that aims to predict emotional responses
elicited by visual stimuli. While recent advances in deep learning have significantly
improved emotion detection capabilities, existing methods often fall short because
of their exclusive focus on either holistic visual features or semantic content, thereby
neglecting their interplay. To address this limitation, we introduce BOVIS, a Bias-Mitigated
Object-Enhanced Visual Emotion Analysis framework. To capture the subtle relationships
between visual and semantic features and enrich the understanding of emotional contexts,
BOVIS leverages pre-trained models to extract comprehensive image features, integrate
object-level semantics, and enhance contextual information. Moreover, BOVIS incorporates
a bias mitigation strategy that involves an adjusted Mean Absolute Error loss function
alongside an Inverse Probability Weighting method to address dataset imbalances and
enhance fairness in emotion prediction. Comprehensive evaluations across various benchmark
datasets demonstrate the effectiveness of the BOVIS framework in enhancing visual
emotion analysis. The results reveal that the synergy between object-specific features
and holistic visual representations improves the accuracy and interpretability of
emotion analysis, while optimizing bias mitigation enhances fairness and increases
reliability. The code is available at https://github.com/leeyubin10/BOVIS.git.
What Data is Really Necessary? A Feasibility Study of Inference Data Minimization
for Recommender Systems
- Jens Leysen
- Marco Favier
- Bart Goethals
Data minimization is a legal principle requiring personal data processing to be limited
to what is necessary for a specified purpose. Operationalizing this principle for
recommender systems, which rely on extensive personal data, remains a significant
challenge. This paper conducts a feasibility study on minimizing implicit feedback
inference data for such systems. We propose a novel problem formulation, analyze various
minimization techniques, and investigate key factors influencing their effectiveness.
We demonstrate that substantial inference data reduction is technically feasible without
significant performance loss. However, its practicality is critically determined by
two factors: the technical setting (e.g., performance targets, choice of model) and
user characteristics (e.g., history size, preference complexity). Thus, while we establish
its technical feasibility, we conclude that data minimization remains practically
challenging and its dependence on the technical and user context makes a universal
standard for data 'necessity' difficult to implement.
Fast Outlier Detection in Oblique Subspaces
- Bowen Li
- Charu C. Aggarwal
- Peixiang Zhao
Subspace outlier detection is a fundamental data mining task for high-dimensional
data with diverse applications across widely varying domains. Existing subspace hashing
methods have been capable of identifying accurate and interpretable outliers in axis-parallel
subspaces in linear time. However, these methods simply fail when applied to arbitrary-shaped,
schema-less data that lack well-defined attributes or dimensions, such as time series
and graphs. In this paper, we introduce a new notion of oblique subspaces defined
on pairwise object proximity functions without requiring explicit multidimensional
representations of the underlying data. By hashing data objects into arbitrarily oriented
oblique subspaces, we can construct subspace hashing histograms for efficient and
cost-effective outlier detection. Our proposed solution, OS-Hash (Oblique Subspace
Hashing), is a linear-time, constant-space method applicable to not only multidimensional
but also arbitrary-shaped data, and it can further be extended to the data stream
setting for subspace outlier detection. Our experimental studies on real-world multidimensional,
time series, and graph data demonstrate the efficiency and efficacy of OS-Hash, which
outperforms state-of-the-art outlier detection methods in terms of both runtime performance
and accuracy.
Bridging Queries and Tables through Entities in Open-Domain Table Retrieval
- Da Li
- Keping Bi
- Jiafeng Guo
- Xueqi Cheng
Open-domain table retrieval plays a vital role in accessing information from structured
formats on the web, yet it remains less explored than text retrieval. Table cells
primarily consist of phrases and words, which include numerous entities, such as times,
locations, persons, and organizations. While emphasizing entities in text retrieval
has been extensively studied, there is a significant lack of research on their applications
in table retrieval. In this work, we explore how to leverage entities in tables to
improve retrieval performance. We investigate the important role of entities in table
retrieval from a statistical perspective and propose an Entity-Centric Alignment framework
for Table retrieval (ECAT). Specifically, we use entity types to highlight entities
appearing in queries and tables. Then, we propose an entity-driven late interaction
paradigm based on entity representations for dense and sparse retrievers, respectively.
Our proposed framework is plug-and-play and flexible, making it easy to integrate
into existing table retrievers. Empirical results on table retrieval benchmarks, NQ-TABLES
and OTT-QA, show that our proposed ECAT is effective in enhancing existing retrievers.
Extensive analyses confirm the efficacy of ECAT's different components. Our code and
dataset are available at https://github.com/Trustworthy-Information-Access/ECAT.
Publicly Verifiable and Fault-Tolerant Privacy-Preserving Aggregation for Federated
Learning
- Guohao Li
- Qi Jiang
- Lu Zhou
- Li Yang
Publicly verifiable privacy-preserving aggregation is widely regarded as an effective
approach to protect user privacy and ensure the integrity of the aggregated model
published by the aggregator in Federated Learning (FL). State-of-the-art solutions
either fail to guarantee unforgeability when the aggregator colludes with malicious
users or require costly cryptographic operations during the online aggregation phase
and lack fault tolerance. In this work, we propose eVTPA, the first online-efficient,
publicly verifiable, and fault-tolerant privacy-preserving aggregation protocol considering
malicious users and aggregators for FL. We introduce a novel collusion-resistant symmetric
masking technique to conceal users' local gradients while ensuring the correctness
of the aggregated model through a publicly verifiable aggregation signature algorithm.
To improve the efficiency of online signature generation, we design a specialized
precomputation-based acceleration method and leverage the randomness of masking to
enable batch processing. Furthermore, eVTPA adopts a dynamic mask update mechanism
that tolerates user dropouts without affecting the validation of the aggregated model.
Security analysis shows that eVTPA meets FL's confidentiality, integrity, and authenticity
requirements. Experimental results demonstrate that our scheme maintains model classification
accuracy while achieving at least a 7.85× faster online aggregation than related solutions
at the same security level.
DO: An Efficient Deep Reinforcement Learning Approach for Optimal Route with Collective
Spatial Keywords
- Jiajia Li
- Jiming Dong
- Lei Li
- Yu Yang
- Xin Wang
- Mengxuan Zhang
Given a source, destination, and required keywords, the Optimal Route with Collective Spatial Keywords ( ORCSK ) query aims to find the shortest route covering all keywords. Existing Point of
Interest (POI) candidate set-based and path expansion-based methods frequently produce
inferior route quality or excessive time overhead, particularly under large-scale
query keywords. To address this challenge, we introduce the DO framework, which pioneers the employ Deep Reinforcement Learning for the ORCSK. Specifically, DO first integrates the spatial index with the H2H index to generate and refine high-quality candidate sets. Subsequently, DO utilizes a Transformer-based model to determine the optimal route from the sets.
To effectively combine spatial distance and POI attributes, we propose a novel dual-cross
encoder architecture. Furthermore, leveraging this architecture, we introduce a multi-route
generating strategy, exploiting parallel computing to enhance route quality. Our experiments
on real-life road networks demonstrate superior route quality and response time compared
to the state-of-the-art method, with an average improvement of 1-2 orders of magnitude
in response time, and maintain high efficiency even under large-scale query keywords
or dynamic POI attributes scenarios.
Where Do LLMs Go Wrong? Diagnosing Automated Peer Review via Aspect-Guided Multi-Level
Perturbation
- Jiatao Li
- Yanheng Li
- Xinyu Hu
- Mingqi Gao
- Xiaojun Wan
Large Language Models (LLMs) are increasingly integrated into academic peer review,
prompting debates between full automation and purely human evaluation. Emerging evidence
suggests optimal peer review leverages both human expertise and AI capabilities, and
several major conferences have already adopted AI-assisted reviewing practices. However,
effectively integrating these reviewers requires an aspect-based understanding of
LLM vulnerabilities, clearly identifying specific dimensions where AI is most prone
to error. Prior studies broadly caution against LLM biases but lack precise, aspect-specific
insights necessary for informed human-AI partnerships in peer-review processes. We
propose an aspect-guided, multi-level perturbation framework to systematically diagnose LLM weaknesses in automated peer review. By introducing
targeted perturbations across key review components (papers, reviews, rebuttals) and
evaluating impacts along critical quality dimensions (contribution, soundness, presentation,
tone, completeness), our framework functions as a diagnostic tool: deviations from
expected rating shifts after perturbation directly reveal specific LLM vulnerabilities.
Our empirical analyses uncover recurring weaknesses, including misclassification of
methodological flaws, disproportionate influence of strong rejection recommendations,
inadequate responses to incomplete or negatively toned rebuttals, and misinterpretation
of incorrect critiques as rigorous evaluations. These vulnerabilities consistently
persist across diverse prompting strategies and a broad set of widely-used LLMs (e.g.,
GPT-4o, Gemini 2.0, LLaMA 3). This diagnostic framework provides granular insights
into LLM limitations, empowering conference organizers to establish pragmatic, aspect-specific
guidelines and enabling balanced, informed, and robust peer-review practices.
Tilia: Enhancing LIME with Decision Tree Surrogates
- Jihang Li
- Jiacheng Qiu
- Yin-Ping Zhao
- Zeyi Wen
Local Interpretable Model-Agnostic Explanations (LIME) is a widely adopted framework
for interpreting opaque models due to its simplicity and intuitiveness. However, LIME
suffers from unreliability rooted in two core issues: (i) low fidelity, where the
surrogate model fails to accurately approximate the target model's behavior, and (ii)
instability, where the generated explanations vary significantly across runs. While
prior work has proposed techniques to enhance LIME, they remain fundamentally limited
by the expressiveness of linear surrogate models, which cannot adequately capture
complex decision boundaries. In this work, we introduce Tilia, a novel method that
employs shallow decision tree regressors as the surrogate model, leveraging its structured
and deterministic nature to improve both fidelity and stability. Tilia also provides
insight into the interplay between surrogate models and sampling strategies, revealing
new directions for enhancing explanation reliability. Across extensive experiments
on tabular and textual datasets, Tilia outperforms LIME and recent variants on both
fidelity and stability, achieving up to 100% approximation of the opaque model and
entirely consistent explanations (i.e., 0 Jacard distance). Tilia maintains practical
efficiency, completing explanations in seconds even for datasets with over 100 features.
These results position Tilia as a robust alternative for model-agnostic explanations.
The code is available at https://github.com/neur1n/tilia.
A Node-Aware Dynamic Quantization Approach for Graph Collaborative Filtering
- Lin Li
- Chunyang Li
- Yu Yin
- Xiaohui Tao
- Jianwei Zhang
In the realm of collaborative filtering recommendation systems, Graph Neural Networks
(GNNs) have demonstrated remarkable performance but face significant challenges in
deployment on resource-constrained edge devices due to their high embedding parameter
requirements and computational costs. Using common quantization method directly on
node embeddings may overlooks their graph based structure, causing error accumulation
during message passing and degrading the quality of quantized embeddings.To address
this, we propose Graph based Node-Aware Dynamic Quantization training for collaborative
filtering (GNAQ), a novel quantization approach that leverages graph structural information
to enhance the balance between efficiency and accuracy of GNNs for Top-K recommendation.
GNAQ introduces a node-aware dynamic quantization strategy that adapts quantization
scales to individual node embeddings by incorporating graph interaction relationships.
Specifically, it initializes quantization intervals based on node-wise feature distributions
and dynamically refines them through message passing in GNN layers. This approach
mitigates information loss caused by fixed quantization scales and captures hierarchical
semantic features in user-item interaction graphs. Additionally, GNAQ employs graph
relation-aware gradient estimation to replace traditional straight-through estimators,
ensuring more accurate gradient propagation during training. Extensive experiments
on four real-world datasets demonstrate that GNAQ outperforms state-of-the-art quantization
methods, including BiGeaR and N2UQ, by achieving average improvement in 27.8% Recall@10
and 17.6% NDCG@10 under 2-bit quantization. In particular, GNAQ is capable of maintaining
the performance of full-precision models while reducing their model sizes by 8 to
12 times; in addition, the training time is twice as fast compared to quantization
baseline methods.
KUG: Joint Enhancement of Internal and External Knowledge for Retrieval-Augmented
Generation
- Mingyang Li
- Shisong Chen
- Shengkun Tu
- Ziyi Du
- Jinghao Zhang
- Zhixu Li
- Yanghua Xiao
Query enhancement, a pivotal methodology in Retrieval-Augmented Generation (RAG) for
addressing information scarcity in queries, has garnered increasing research attention.
Nevertheless, existing approaches overlook the inherent distinctions between domain-specific
knowledge and external factual sources during integration. To bridge this gap, we
propose KUG (Knowledge-Update-Generation), a novel RAG framework that leverages internal
knowledge semantics to ensure query enhancement efficacy, validates and dynamically
updates knowledge representations using external evidence, and achieves systematic
integration through knowledge graph embeddings. Extensive experiments on six standard
BEIR benchmarks demonstrate that KUG outperforms the state-of-the-art methods, achieving
an improvement of 1%-2% in recall metrics. Notably, the framework demonstrates significant
performance gains in multi-hop reasoning tasks, advancing the development paradigm
for RAG systems. The code will be public soon.
Content-Agnostic Moderation for Stance-Neutral Recommendations
- Nan Li
- Bo Kang
- Tijl De Bie
Personalized recommendation systems often drive users towards more extreme content,
exacerbating opinion polarization. While content-aware moderation has been proposed
to mitigate these effects, such approaches risk curtailing the freedom of speech and
information. To address this concern, we propose and explore the feasibility of content-agnostic moderation as an alternative approach for reducing polarization. Content-agnostic
moderation does not rely on the actual content being moderated, arguably making it
less prone to forms of censorship. We establish theoretically that content-agnostic
moderation cannot be guaranteed to work in a fully generic setting. However, we show
that it can often be effectively achieved in practice with plausible assumptions.
We introduce two novel content-agnostic moderation methods that modify recommendations
from the content recommender to disperse user-item co-clusters without relying on
content features.
To evaluate the potential of content-agnostic moderation in controlled experiments,
we built a simulation environment to analyze the closed-loop behavior of a system
with a given set of users, a recommendation system, and a moderation approach. Through
comprehensive experiments in this environment, we show that our proposed moderation
methods significantly enhance stance neutrality and maintain high recommendation quality
across various data scenarios. Our results indicate that achieving stance neutrality
without direct content information is not only feasible but can also help develop
more balanced and informative recommendation systems without substantially degrading
user engagement.
TKHist: Cardinality Estimation for Join Queries via Histograms with Dominant Attribute
Correlation Finding
- Renrui Li
- Qingzhi Ma
- Jiajie Xu
- Lei Zhao
- An Liu
Cardinality estimation has long been crucial for cost-based database optimizers in
identifying optimal query execution plans, attracting significant attention over the
past decades. While recent advancements have significantly improved the accuracy of
multi-table join query estimations, these methods introduce challenges such as higher
space overhead, increased latency, and greater complexity, especially when integrated
with the binary join framework. In this paper, we introduce a novel cardinality estimation
method named TKHist, which addresses these challenges by relaxing the uniformity assumption in histograms.
TKHist captures bin-wise non-uniformity information, enabling accurate cardinality estimation
for join queries without filter predicates. Furthermore, we explore the attribute
independent assumption, which can lead to significant over-estimation rather than
under-estimation in multi-table join queries. To address this issue, we propose the
dominating join path correlation discovery algorithm to highlight and manage correlations
between join keys and filter predicates. Our extensive experiments on popular benchmarks
demonstrate that TKHist reduces error variance by 2-3 orders of magnitude compared to SOTA methods, while
maintaining comparable or lower memory usage.
Contextual Representation Anchor Network for Mitigating Selection Bias in Few-Shot
Drug Discovery
- Ruifeng Li
- Wei Liu
- Xiangxin Zhou
- Mingqian Li
- Qiang Zhang
- Hongyang Chen
- Xuemin Lin
In the drug discovery process, the low success rate of drug candidate screening often
leads to insufficient labeled data, causing the few-shot learning problem in molecular
property prediction. Existing methods for few-shot molecular property prediction overlook
the sample selection bias, which arises from non-random sample selection in chemical
experiments. This bias in data representativeness leads to suboptimal performance.
To overcome this challenge, we present a novel method named Contextual Representation
Anchor Network (CRANet), where an anchor refers to a cluster center of the representations of molecules
and serves as a bridge to transfer enriched contextual knowledge into molecular representations
and enhance their expressiveness. CRANet introduces a dual-augmentation mechanism
that includes context augmentation, which dynamically retrieves analogous unlabeled
molecules and captures their task-specific contextual knowledge to enhance the anchors,
and anchor augmentation, which leverages the anchors to augment the molecular representations.
We evaluate our approach using the MoleculeNet and FS-Mol benchmarks, as well as through
domain transfer experiments. The outcomes indicate that CRANet surpasses current state-of-the-art
methods by 0.10% to 5.48% in AUC and 2.52% in ΔAUC-PR metrics, showcasing its exceptional
generalization abilities.
Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language
Models: Discoveries and Insights
- Ruyu Li
- Wenhao Deng
- Yu Cheng
- Zheng Yuan
- Jiaqi Zhang
- Fajie Yuan
Text-based collaborative filtering (TCF) has emerged as the prominent technique for
text and news recommendation, employing language models (LMs) as text encoders to
represent items. However, the current landscape of TCF models mainly relies on the
utilization of relatively small or medium-sized LMs. The potential impact of using
larger, more powerful language models (such as these with over 100 billion parameters)
as item encoders on recommendation performance remains uncertain. Can we anticipate
unprecedented results and discover new insights?
To address this question, we undertake a comprehensive series of experiments aimed
at exploring the performance limits of the TCF paradigm. Specifically, we progressively
augment the scale of item encoders, ranging from one hundred million to one hundred
billion parameters, in order to reveal the scaling limits of the TCF paradigm. Moreover,
we investigate whether these exceptionally large LMs have the potential to establish
a universal item representation for the recommendation task, thereby revolutionizing
the traditional ID paradigm, which is considered a significant obstacle to developing
transferable ''one model fits all'' recommender models. Our study not only demonstrates
positive results but also uncovers unexpected negative outcomes, illuminating the
current state of the TCF paradigm within the community. These findings will evoke
deep reflection and inspire further research on text-based recommender systems.
Linking Ordered and Orderless Modeling for Sequential Recommendation
- Sijia Li
- Min Gao
- Zongwei Wang
- Yibing Bai
- Wuhan Chen
Sequential recommendation is pivotal to personalized services by modeling the temporal
dynamics of user behavior. However, existing methods often rely on abundant interactions,
making it unreliable under sparse user interactions. Recent attempts to integrate
sequential signals with orderless structural cues (e.g., global co-occurrence) help
alleviate this issue but typically adopt tight fusion, which can dilute order-aware
signals. To address this, we propose LOOM (Loosely-Coupled Ordered-Orderless Modeling),
a structure-agnostic guidance module for sequential recommenders. LOOM is sequence-first:
The sequential backbone acts as a teacher, guiding orderless carriers via one-way
KL divergence, with recency-aware weighting and confidence-modulated strength to filter
stale or uncertain relations. This preserves temporal modeling while selectively incorporating
complementary orderless knowledge. Experiments on four public datasets and various
sequential architectures show that LOOM outperforms state-of-the-art methods. Code
is available at https://github.com/cqu-jia/LOOM.
Give Me Some SALT: Structure-Aware Link Modeling for Temporal Weighted Link Prediction
- Ting Li
- Hanchen Wang
- Yiran Li
- Xiaolei Liu
In dynamic graph analysis, research has predominantly focused on temporal link prediction
(TLP) for unweighted links, with growing interest in predicting temporal link weights
in recent years. Temporal weighted link prediction (TWLP) aims to estimate both the
existence and the link weights, which is naturally formulated as a regression task.
The long-tail distribution and short-term randomness of link weights pose significant
challenges for TWLP. In this paper, we introduce SALT, a Structure-Aware Link modeling
for Temporal weighted link prediction, which consists of Weighted Link Encoder (WLE)
and Temporal Link State Space Module (TLSSM). WLE encodes each snapshot into link-centric
embeddings with common neighbor information, and addresses the long-tail issue by
leveraging weights to adjust the embedding distribution. Additionally, TLSSM is designed
to handle short-term randomness in temporal modeling. On eight datasets, our model
achieves average reductions of 19.86% in RMSE and 24.61% in MAE compared to state-of-the-art
baselines.
QGCMA: A Framework for Knowledge-Based Visual Question Answering
Visual Question Answering (VQA) systems encounter formidable challenges when tackling
complex queries that demand external knowledge integration and multi-modal reasoning.
Current methodologies often grapple with the effective alignment of visual and textual
features, as well as the utilization of structured knowledge bases, which limits their
performance in handling intricate semantic and inferential tasks. To address these
critical issues, this paper presents a framework based on three key innovations. Firstly,
the Question-Guided Attention (QGA) mechanism adaptively steers the model's focus
towards visual regions and knowledge entities that are semantically congruent with
the query. By doing so, it ensures that contextually relevant information is prioritized
during the feature extraction process, enhancing the model's ability to capture pertinent
visual and knowledge cues. Secondly, the Cross-Modal Alignment (CMA) module employs
a contrastive learning strategy to enforce precise alignment across visual, textual,
and knowledge modalities. This approach effectively mitigates the detrimental effects
of spurious correlations by enhancing semantic consistency among heterogeneous data
sources, thereby improving the overall quality of multi-modal feature integration.
Thirdly, the Dynamic Knowledge Integration (DKI) component empowers the model to dynamically
select and fuse knowledge information from external graph structures. This functionality
significantly augments the model's reasoning capacity, enabling it to handle questions
that necessitate compositional inference over structured knowledge. Comprehensive
experimental evaluations conducted on the OK-VQA and VQA v2 benchmarks demonstrate
the superiority of our proposed method over existing state-of-the-art methods.
Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark
- Xiaopeng Li
- Jingtong Gao
- Pengyue Jia
- Xiangyu Zhao
- Yichao Wang
- Wanyu Wang
- Yejing Wang
- Yuhao Wang
- Huifeng Guo
- Ruiming Tang
Multi-Scenario Recommendation (MSR) tasks, referring to building a unified model to
enhance performance across all recommendation scenarios, have recently gained considerable
attention. However, current research in MSR faces two significant challenges that
hinder the field's development: the absence of uniform procedures for multi-scenario
dataset processing, thus hindering fair comparisons, and most models being closed-source,
which complicates comparisons with current SOTA models. Consequently, we introduce
our benchmark, Scenario-Wise Rec, which comprises six public datasets and twelve baseline
models, along with a training and evaluation pipeline. We further validate Scenario-Wise
Rec on an industrial advertising dataset, underscoring its robustness. We hope the
benchmark will give researchers clear insights into prior work, enabling them to develop
novel models and thereby fostering a collaborative research ecosystem in MSR. Our
source code is publicly available (https://github.com/Applied-Machine-Learning-Lab/Scenario-Wise-Rec).
Structural Entropy-based Multivariate Time Series Forecasting
- Xinhui Li
- Kun Yue
- Lixing Yu
- Peizhong Yang
Multivariate time series (MTS) forecasting is crucial for predicting the future states
of complexly coupled variables based on historical observations. To effectively capture
the intricate interdependencies within MTS, graph-based methods have emerged as powerful
tools. However, existing graph construction methods often produce structures that
fail to preserve key temporal and cross-variable dependencies, introducing redundant
or irrelevant connections. To address these challenges, we propose a structural entropy-based
approach for MTS forecasting. The approach optimizes graph structures by reducing
structural redundancy, thereby improving forecasting accuracy. Initially, we represent
the temporal dependences by constructing an encoding tree incrementally. Through hierarchical
organization of time steps, the temporal evolution is adaptively captured. Subsequently,
we give the community-aware representation by building an encoding tree over the variables
in MTS, extracting homogeneous communities from the tree structure while integrating
community influence to better capture inter-variable dependencies. Finally,we present
a training algorithm designed to generate accurate predictions for MTS, accompanied
by a unified loss function that integrates forecasting inaccuracies with variations
in structural entropy. Empirical findings on real-world datasets substantiate that
our approach outperforms state-of-the-art models in capturing dependencies and enhancing
forecasting precision.
OFedED: One-shot Federated Learning with Model Ensemble and Dataset Distillation
- Xuhui Li
- Zhengquan Luo
- Zihui Cui
- Xin Cao
- Zhiqiang Xu
One-shot federated learning (FL) has gained traction due to its communication efficiency
and scalability. However, unlike traditional FL, which can frequently align client
models through multiple rounds of client training and server aggregation, one-shot
FL allows only a single communication round, causing each client to easily overfit
its local data and leading to divergent objectives. Without any chance to iteratively
correct these biases or mitigate heterogeneity, the aggregated model significantly
deviates from the optimum achieved under dataset centralized training. To address
this challenge, we propose OFedED, a one-shot FL framework that preserves privacy and fully exploits client data by
combining local data distillation with server-side ensemble learning. Each client
distills its own dataset into an ultracompact coreset that retains essential distributional
characteristics; the server aggregates these coresets to guide ensemble training that
captures inter-client heterogeneity, harnesses complementary knowledge, corrects local
bias, and drives performance close to centralized training. In addition, we theoretically
show that, under mild assumptions for local data distillations, the server can simulate
a centralized optimization process by finetuning on the aggregated distilled data,
effectively bypassing the need for multiple communication rounds, showing that properly
distilled data can encode sufficient task-relevant information to support centralized-level
optimization. Extensive experiments reveal that OFedED consistently and significantly
outperforms SOTA methods, achieving an improvement of up to 9.17% on MNIST and 3.97%
on CIFAR-10, the robustness being verified also by experiments using ResNet and various
server-client architectures.
Ensemble Pruning via Graph Neural Networks
- Yuanke Li
- Yiyang Liu
- Dongmian Zou
- Hongfei Wang
Ensemble learning is a pivotal machine learning strategy that combines multiple base
learners to achieve prediction accuracy surpassing that of any individual model. Despite
its effectiveness, large-scale ensemble learning consumes a considerable amount of
resources. Ensemble pruning addresses this issue by selecting a subset of base learners
from the original ensemble to form a sub-ensemble, while maintaining or even improving
the performance of the original model. However, existing ensemble pruning strategies
often rely on heuristic solutions that may fail to capture complex interactions among
base learners. To address this limitation, in this work, we model the base learners
in an ensemble as a weighted and attributed graph, where node features represent characteristics
of each learner and edge weights represent relationships between the base learners.
Leveraging this representation, we propose a novel ensemble pruning method based on
graph neural networks (GNNs). Our approach incorporates specialized GNN architectures
designed for bagging and boosting ensembles. Experimental results demonstrate that
our method not only improves prediction accuracy but also significantly reduces inference
time across diverse datasets. Our implementation is available at the anonymous repository:
https://github.com/TechnologyAiGroup/GRE.
STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach
- Yujie Li
- Shao Zezhi
- Chengqing Yu
- Tangwen Qian
- Zhao Zhang
- Yifan Du
- Shaoming He
- Fei Wang
- Yongjun Xu
Spatio-temporal tasks often encounter incomplete data arising from missing or inaccessible
sensors, making spatio-temporal kriging crucial for inferring the completely missing
temporal information. However, current models struggle with ensuring the validity
and generalizability of inferred spatio-temporal patterns, especially in capturing
dynamic spatial dependencies and temporal shifts, and optimizing the generalizability
of unknown sensors. To overcome these limitations, we propose Spatio-Temporal Aware
Graph Adversarial Neural Network (STA-GANN), a novel GNN-based kriging framework that
improves spatio-temporal pattern validity and generalization. STA-GANN integrates
(i) Decoupled Phase Module that senses and adjusts for timestamp shifts. (ii) Dynamic
Data-Driven Metadata Graph Modeling to update spatial relationships using temporal
data and metadata; (iii) An adversarial transfer learning strategy to ensure generalizability.
Extensive validation across nine datasets from four fields and theoretical evidence
both demonstrate the superior performance of STA-GANN.
BordaRAG: Resolving Knowledge Conflict in Retrieval-Augmented Generation via Borda
Voting Process
- Yuxin Li
- Chen Xu
- Jun Xu
- Ji-Rong Wen
Recently, research found that the documents retrieved from the Retrieval-Augmented
Generation (RAG) may contain conflicting knowledge with each other, leading Large
Language Models (LLMs) to generate incorrect responses. To solve such a problem, existing
approaches usually only keep the most frequently mentioned knowledge from these documents,
since they assume that the most representative knowledge aligns best with the true
answer. Although effective in certain scenarios, these approaches often underperform
when the most frequent knowledge is not the correct one. From the voting perspective,
these methods can be regarded as a Majority Voting (MV) process, which chooses the
most frequent candidates among different candidate knowledge. However, we show that
the underperformance of such methods stems from that MV is only effective with a small
number of candidates and binary voting scores. In contrast, in the RAG scenario, the
candidates (knowledge) are very diverse, and the voting scores (document relevance
scores) are typically continuous. Simply adapting MV in RAG will result in poor performance
of LLMs. In voting theory, on the other hand, the preference-based voting methods
represented by the Borda Voting (BV) consider the whole preference order of voters
over all candidates, enabling the selection of candidates that better represent the
collective viewpoint. Inspired by such an insight, we propose BordaRAG, a model designed
to better select the most appropriate documents from conflicting documents. Specifically,
BordaRAG first computes the preference scores of the documents over the candidate
answers. After that, a BV component is designed to select the winning documents according
to the preference scores. Finally, the chosen documents are provided to LLMs, which
will generate the final response. Experimental results on three open-domain QA datasets
show that BordaRAG can outperform all baselines.
A Cost-Aware Approach for Collaborating Large Language Models and Small Language Models
- Zheng Li
- Xuyun Zhang
- Sheng Lu
- Hua Deng
- Hao Tian
- Wanchun Dou
The emerging reasoning ability of large language models (LLMs) and accompanying commercial
applications offer a promising path for service providers to deploy intelligent agents
on their own products through API calls. However, the black-box nature of LLMs has
driven providers to try prompt tuning to improve reasoning quality for competitiveness,
while the generated reasoning logic results in additional service costs. Although
some works have proposed collaborating LLMs and Small Language Models (SLMs) to reduce
the frequency of LLM calls, most overlook the actual number of tokens interacting
with the LLMs, which results in a potentially high cost still. Furthermore, directly
compressing the prompt to reduce tokens often leads to a significant accuracy loss.
To address the above challenges, we propose a cost-aware approach for collaborating
LLMs and SLMs, named Coco. In our method, a confidence-based task assignment method
is designed which leverages the result confidence of SLMs to assess task complexity
and determine whether LLM involvement is necessary. For complex tasks, the SLM adapts
the input by compressing unnecessary information according to confidence. Considering
the potential loss of accuracy, prompt tuning-based reasoning optimization methods
are introduced to guide the LLM in generating both the reasoning logic sketch and
the final result. Finally, logic alignment is applied to fuse sketches from both models,
ensuring the rationality of the reasoning logic. Experimental results on three open-source
datasets demonstrate that our approach effectively reduces the cost of API calls to
LLMs while ensuring the reasoning accuracy and the reasonableness of generated logic.
Differentiable Probabilistic Logic Reasoning For Knowledge Graph Completion
- Zhongbin Li
- Lixing Yu
- Kun Yue
- Xinquan Wu
Towards Knowledge Graph (KG) completion, probabilistic logic reasoning approaches
enable effective rule mining but incur high computational cost, while embedding-based
methods offer high efficiency but confront limited semantic understanding. Neuro-symbolic
approaches combine both by employing embeddings to approximate probabilistic distributions,
yet face challenges in weight optimization and hurdles in scaling up from large probability
graphs. To address these issues, we propose DPLogic, a differentiable probabilistic
logic reasoning framework for KG completion. Initially, we construct a Markov logic
network by selecting crucial formulas and constraining groundings to relevant subgraphs,
effectively boosting the scalability of the framework. Subsequently, we represent
formula weights through relation-specific embeddings by introducing neural logical
operators, creating a differentiable pathway for end-to-end optimization. Finally,
we obtain the distribution of unobserved KG triplets by facilitating the joint optimization
of embedding-based and probabilistic distributions through an EM algorithm. Empirical
findings on standardized datasets illustrate that our proposed DPLogic consistently
surpasses state-of-the-art methodologies in terms of both efficacy and efficiency.
Calibrating on Kolmogorov-Arnold Network
- Wenhao Liang
- Wei Emma Zhang
- Lin Yue
- Miao Xu
- Olaf Maennel
- Weitong Chen
Kolmogorov-Arnold Networks (KANs) are neural architectures inspired by the Kolmogorov-Arnold
representation theorem that leverage B-spline parameterizations for flexible, locally
adaptive function approximation. Although KANs can capture complex nonlinearities
beyond those modeled by standard Multi-Layer Perceptrons (MLPs), they frequently exhibit
miscalibrated confidence estimates-manifesting as overconfidence in dense data regions
and underconfidence in sparse areas. In this work, we systematically examine the impact
of four critical hyperparameters -- Layer Width, Grid Order, Shortcut Function, and
Grid Range -- on the calibration of KANs. Furthermore, we introduce a novel Temperature-Scaled
Loss (TSL) that integrates a temperature parameter directly into the training objective,
dynamically adjusting the predictive distribution during learning. Both theoretical
analysis and extensive empirical evaluations on standard benchmarks demonstrate that
TSL significantly reduces calibration errors, thereby improving the reliability of
probabilistic predictions. Overall, our study provides actionable insights into the
design of spline-based neural networks and establishes TSL as a robust, loss-agnostic
solution for enhancing calibration.
Calibrating on Medical Segmentation Model through Signed Distance
- Wenhao Liang
- Wei Emma Zhang
- Lin Yue
- Miao Xu
- Olaf Maennel
- Weitong Chen
Classical overlap metrics such as Dice or IoU quantify where a medical-image segmentation falls short but say nothing about the confidence of each prediction. Over-confident errors are particularly dangerous in clinical
practice, where a single false-positive voxel may trigger an unnecessary biopsy. We
introduce three contributions that jointly address spatial precision and reliability.
(i) Signed-Distance Calibration (SDC) loss couples cross-entropy, local calibration and a differentiable signed-distance penalty,
enforcing boundary accuracy while moderating confidence. (ii) A Spatially Adaptive Margin (SAM) module applies lightweight morphological transforms to ground-truth masks before
computing the local target, sharpening ambiguous edges. (iii) Pixel-wise Expected Calibration Error (pECE) extends ECE to millions of voxels and penalises high-confidence false positives.
Across four public datasets (ACDC, FLARE, BraTS, PROSTATE) and two back-bones (U-Net,
nnU-Net), SDC improves Dice by up to 4 percentage points and halves ECE compared with
the state of the art, without sacrificing runtime. Code is available on: https://github.com/EagleAdelaide/SDC-Loss.
UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion
- Zihan Liang
- Yufei Ma
- Zhipeng Qian
- Huangyu Dai
- Zihan Wang
- Ben Chen
- Chenyi Lei
- Yuqing Ding
- Han Li
The growth of e-commerce has created substantial demand for multimodal search systems
that process diverse visual and textual inputs. Current e-commerce multimodal retrieval
systems face two key limitations: they optimize for specific tasks with fixed modality
pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches.
To address these challenges, we introduce UniECS, a unified multimodal e-commerce
search framework that handles all retrieval scenarios across image, text, and their
combinations. Our work makes three key contributions. First, we propose a flexible
architecture with a novel gated multimodal encoder that uses adaptive fusion mechanisms.
This encoder integrates different modality representations while handling missing
modalities. Second, we develop a comprehensive training strategy to optimize learning.
It combines cross-modal alignment loss (CMAL), cohesive local alignment loss (CLAL),
intra-modal contrastive loss (IMCL), and adaptive loss weighting. Third, we create
M-BEER, a carefully curated multimodal benchmark containing 50K product pairs for
e-commerce search evaluation. Extensive experiments demonstrate that UniECS consistently
outperforms existing methods across four e-commerce benchmarks with fine-tuning or
zero-shot evaluation. On our M-BEER bench, UniECS achieves substantial improvements
in cross-modal tasks (up to 28% gain in R@10 for text-to-image retrieval) while maintaining
parameter efficiency (0.2B parameters) compared to larger models like GME-Qwen2VL
(2B) and MM-Embed (8B). Furthermore, we deploy UniECS in the e-commerce search platform
of Kuaishou Inc. across two search scenarios, achieving notable improvements in Click-Through
Rate (+2.74%) and Revenue (+8.33%). The comprehensive evaluation demonstrates the
effectiveness of our approach in both experimental and real-world settings. Corresponding
codes, models and datasets will be made publicly available at https://github.com/qzp2018/UniECS.
Federated Continual Recommendation
- Jaehyung Lim
- Wonbin Kweon
- Woojoo Kim
- Junyoung Kim
- Seongjin Choi
- Dongha Kim
- Hwanjo Yu
The increasing emphasis on privacy in recommendation systems has led to the adoption
of Federated Learning (FL) as a privacy-preserving solution, enabling collaborative
training without sharing user data. While Federated Recommendation (FedRec) effectively
protects privacy, existing methods struggle with non-stationary data streams, failing
to maintain consistent recommendation quality over time. On the other hand, Continual
Learning Recommendation (CLRec) methods address evolving user preferences but typically
assume centralized data access, making them incompatible with FL constraints. To bridge
this gap, we introduce Federated Continual Recommendation (FCRec), a novel task that integrates FedRec and CLRec, requiring models to learn from streaming
data while preserving privacy. As a solution, we propose F3CRec, a framework designed to balance knowledge retention and adaptation under the strict
constraints of FCRec. F3CRec introduces two key components: Adaptive Replay Memory on the client side, which selectively retains past preferences based on user-specific
shifts, and Item-wise Temporal Mean on the server side, which integrates new knowledge while preserving prior information.
Extensive experiments demonstrate that F3CRec outperforms existing approaches in maintaining recommendation quality over time
in a federated environment. Our code is available at https://github.com/Jaehyung-Lim/F3CRec-CIKM-25.
ConGM: Contrastive Graph Matching for Graph Self-Supervised Learning
- Hongxiang Lin
- Lei Wang
- Huiying Hu
- Xiaoqing Lyu
Graph neural networks (GNNs) are widely used in information retrieval, but they often
require large amounts of labeled data. To address this problem, self-supervised methods
like graph contrastive learning (GCL) are developed to learn from graph structures
without labeled data. However, GCL faces a challenge in practice. Plenty of traditional
GCL methods differ fundamentally from GNNs in handling neighboring nodes, hindering
effective contrastive learning. To address the above issue, we propose a graph self-supervised
learning model based on Contrastive Graph Matching (ConGM). The model effectively
mitigates the conflict between GCL methods and the homophily assumption of GNNs by
using linear node matching and quadratic edge alignment mechanisms to treat some neighboring
nodes as both positive and negative samples, rather than considering all neighboring
nodes as negative samples as in traditional GCL methods. Additionally, to tackle the
imbalance of positive and negative samples in edge alignment, we design a bi-level
negative sample selection strategy to choose appropriate hard negative samples. Extensive
experiments conducted on multiple benchmark datasets have validated the effectiveness
of our proposed method.
To Know What User Concerns: Conceptual Knowledge Reasoning for User Satisfaction Estimation
in E-Commerce Dialogue Systems
- Li Lin
- Yaochang Liu
- Kaiwen Xia
- Shuai Wang
With the development of generative models, dialogue systems play an important role
in many web applications, such as E-commerce and Question-Answering websites. The
accurate user satisfaction estimation (USE) is a critical problem in measuring the
quality of dialogue systems. In e-commerce, users usually seek consultation through
dialogue systems to know detailed information about the products they intend to purchase.
Existing studies mainly focus on analyzing user sentiment in a dialogue for USE, neglecting
to understand what the user is concerned about when requesting a consultation. It
may cause fatal errors when the response is emotionally friendly but non-informative.
Thus, to evaluate how a dialogue satisfies the user's requirements, it is essential
to have a conceptual understanding of the products to determine if the response has
addressed the user's question. In this paper, we propose a knowledge-enhanced USE
model named CoRe-USE, which introduces the Conceptual Knowledge Reasoning for USE in E-Commerce Dialogue Systems. We first design a simple yet efficient entity linking
and relation selection module enabling conceptual reasoning in each dialogue. Then,
we propose a hierarchical encoder to capture the contextual information in multi-turn
dialogues. Finally, we introduce a knowledge enhancement module to fuse conceptual
reasoning into contextual embeddings to produce USE. For evaluation, we conduct experiments
on three real-world datasets in various scenarios, the results demonstrate the effectiveness
and robustness of CoRe-USE compared with SOTA baselines.
SC-DAG: Semantic-Constrained Diffusion Attacks for Stealthy Exposure Manipulation
in Visually-Aware Recommender Systems
- Ze Lin
- Yuqiu Qian
- Xiaodong Li
- Ziyu Lyu
- Hui Li
Visually-aware recommender system (VARS) has become increasingly prevalent in various
online services by integrating visual features of items to enhance recommendation
quality. However, VARS introduces new security vulnerabilities and malicious attackers
can perform visual shilling attacks to manipulate recommendation lists via uploading
generated images with visually imperceptible perturbations. While prior research has
explored such threats to help service providers enhance their systems, existing visual
shilling attack methods still suffer from uncontrolled pixel-space perturbation, energy
dispersion dilemma and semantic misalignment in reference selection. In this work,
we present Semantic-Constrained Diffusion Adversarial Generation (SC-DAG) for visual
shilling attacks. SC-DAG overcomes key limitations of previous methods by focusing
perturbations on semantically meaningful image regions through contour-aware segmentation,
guiding adversarial generation in latent space using a conditional diffusion process,
and performing a hybrid reference image selection strategy that balances popularity
and semantic similarity. Extensive experiments on performing visual shilling attacks
against multiple VARS models show that SC-DAG achieves state-of-the-art attack performance
in elevating target items' ranking, while maintaining strong perceptual indistinguishability
and minimal impact on overall recommendation performance of the system. Our work offers
insights into leveraging structured semantic priors for more sophisticated adversarial
manipulations against VARS and also highlights the necessity for developing more robust
VARS models resilient to visual shilling attacks. We provide our implementation at
https://github.com/KDEGroup/SC-DAG.
Crocodile: Cross Experts Covariance for Disentangled Learning in Multi-Domain Recommendation
- Zhutian Lin
- Junwei Pan
- Haibin Yu
- Xi Xiao
- Ximei Wang
- Zhixiang Feng
- Shifeng Wen
- Shudong Huang
- Dapeng Liu
- Lei Xiao
Multi-domain learning (MDL) has become a prominent topic in enhancing the quality
of personalized services. It's critical to learn commonalities between domains and
preserve the distinct characteristics of each domain. However, this leads to a challenging
dilemma in MDL. On the one hand, a model needs to leverage domain-aware modules such
as experts or embeddings to preserve each domain's distinctiveness. On the other hand,
real-world datasets often exhibit long-tailed distributions across domains, where
some domains may lack sufficient samples to effectively train their specific modules.
Unfortunately, nearly all existing work falls short of resolving this dilemma. To
this end, we propose a novel Cross-experts Covariance Loss for Disentangled Learning
model (Crocodile), which employs multiple embedding tables to make the model domain-aware
at the embeddings which consist most parameters in the model, and a covariance loss
upon these embeddings to disentangle them, enabling the model to capture diverse user
interests among domains. Empirical analysis demonstrates that our method successfully
addresses both challenges and outperforms all state-of-the-art methods on public datasets.
During online A/B testing in Tencent's advertising platform, Crocodile achieves 0.72%
CTR lift and 0.73% GMV lift on a primary advertising scenario. The code is openly
accessible at: https://github.com/SkylerLinn/Crocodile.
EAPformer: Entropy-Aware Patch Transformer for Multivariate Long-Term Time Series
Forecasting
- Jiahao Ling
- Xuan Yang
- Shimin Gong
- Bo Gu
Multivariate long-term time series forecasting is pivotal across numerous domains,
yet precise predictions require a differentiated assessment of historical time segments
due to their varying influence on future trends. Patch-based Transformer frameworks
show promise for capturing local temporal patterns. However, they face limitations
with static patching, which disrupts temporal continuity, fails to adapt to shifts
between periodic and volatile patterns, and overlooks dynamic interactions between
time segments and variables. To address these limitations, we propose Entropy-Aware
Patch Transformer (EAPformer) which dynamically segments time series for differentiated
assessments of historical patterns. Specifically, we overcome static patching limitations
by leveraging temporal entropy to dynamically adjust patch boundaries through a two-stage
policy, achieving interpretable and context-sensitive segmentation. Subsequently,
we adapt EAPformer to periodic and volatile dynamics by employing entropy-aware segmentation
that captures distinct temporal patterns across diverse segments. Finally, we further
capture dynamic interactions across time segments and variables by introducing a multi-dimensional
dependency learning architecture. Additionally, a gated fusion mechanism integrates
local and global patterns, enhancing robustness. Extensive experiments on eight public
benchmarks demonstrate that EAPformer outperforms state-of-the-art models, achieving
superior accuracy across all metrics.
From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive
Learning for Multi-Turn Classification
- Junhua Liu
- Yong Keat Tan
- Bin Fu
- Kwan Hui Lim
In conversational AI systems, a critical challenge in training effective multi-turn
intent classification models lies in the generation of large-scale, domain-specific,
multilingual dialogue datasets. In this paper, we introduce Chain-of-Intent, a novel
framework that integrates Hidden Markov Models (HMMs) with Large Language Models (LLMs)
to generate intent-driven, context-aware dialogues through self-play. Our method first
extracts domain-specific intent transition patterns from real-world e-commerce chat
logs, which guide the modeling of turn-level dynamics and intent sequences. LLMs are
then employed to parameterize the emission probabilities of HMMs, enabling the generation
of natural, coherent utterances aligned with predicted intents and dialogue context.
We also propose MINT-CL, a multi-task contrastive learning framework for multi-turn
intent classification, which improves performance while reducing dependence on large-scale
annotated datasets. Empirical results demonstrate that our approach outperforms competitive
baselines in dialogue generation quality and classification accuracy, particularly
in multilingual settings. To facilitate future research, we release MINT-E, a comprehensive,
multilingual, intent-aware multi-turn dialogue corpus derived from the e-commerce
domain.
Improved Personalized Headline Generation via Denoising Fake Interests from Implicit
Feedback
- Kejin Liu
- Junhong Lian
- Xiang Ao
- Ningtao Wang
- Xing Fu
- Yu Cheng
- Weiqiang Wang
- Xinyu Liu
Accurate personalized headline generation hinges on precisely capturing user interests
from historical behaviors. However, existing methods neglect personalized-irrelevant
click noise in entire historical clickstreams, which may lead to hallucinated headlines
that deviate from genuine user preferences. In this paper, we reveal the detrimental
impact of click noise on personalized generation quality through rigorous analysis
in both user and news dimensions. Based on these insights, we propose a novel Personalized
Headline Generation framework via Denoising Fake Interests from Implicit Feedback
(PHG-DIF). PHG-DIF first employs dual-stage filtering to effectively remove clickstream
noise, identified by short dwell times and abnormal click bursts, and then leverages
multi-level temporal fusion to dynamically model users' evolving and multi-faceted
interests for precise profiling. Moreover, we release DT-PENS, a new benchmark dataset
comprising the click behavior of 1,000 carefully curated users and nearly 10,000 annotated
personalized headlines with historical dwell time annotations. Extensive experiments
demonstrate that PHG-DIF substantially mitigates the adverse effects of click noise
and significantly improves headline quality, achieving state-of-the-art (SOTA) results
on DT-PENS. Our framework implementation and dataset are available at https://github.com/liukejin-up/PHG-DIF.
AdaPatch: Adaptive Patch-Level Modeling for Non-Stationary Time Series Forecasting
- Kun Liu
- Zhongjie Duan
- Cen Chen
- Yanhao Wang
- Dawei Cheng
- Yuqi Liang
Time series forecasting has witnessed significant advancements through deep learning
techniques. However, most existing methods struggle in non-stationary environments,
where data distributions evolve over time due to concept drift. To address the challenge
of non-stationarity in time series, various stabilization techniques have been proposed
to mitigate temporal variations. Nonetheless, these methods operate at the instance
level, assuming a homogeneous distribution across all time steps within an instance
and relying on fixed statistical normalization. This limits their ability to effectively
capture fine-grained distributional shifts.
In this paper, we introduce AdaPatch, a novel forecasting model specifically designed to tackle non-stationary multivariate
time series. AdaPatch addresses intra-instance distributional shifts by adopting an
adaptive scheme for patch-level encoding and normalization, which makes the model
capture fine-grained temporal variations more effectively. To further enhance the
quality of representations, AdaPatch incorporates a patch reconstruction branch and
jointly optimizes a reconstruction loss alongside the forecasting objective. This
auxiliary path serves as an implicit regularization mechanism, guiding the encoder
to retain meaningful local temporal structures. Furthermore, to enable AdaPatch to
better model complex local dynamics, we propose a patch-based predictive decoding
strategy that leverages the decoder from the reconstruction branch to replace conventional
point-wise forecasting with a more structured patch-level prediction mechanism. Extensive
experiments conducted on six real-world multivariate time series datasets demonstrate
that AdaPatch achieves superior performance compared to several state-of-the-art baselines,
highlighting its effectiveness and strong generalization capability. Our code and
data are publicly available at https://github.com/iuaku/AdaPatch.
LLMCBR: Large Language Model-based Multi-View and Multi-Grained Learning for Bundle
Recommendation
- Shiqin Liu
- Chaozhuo Li
- Minjun Zhao
- Litian Zhang
- Jiajun Bu
The exploration of bundle recommendation has garnered significant attention for its
potential to enhance user experience and augment business sales. Previous research
in this domain has primarily focused on modeling user-item and user-bundle interactions,
utilizing multi-view collaboration to bolster the accuracy of bundle recommendations.
Nevertheless, existing methodologies exhibit limitations, notably in the inadequate
modeling of multi-view information and the absence of multi-grained details. Consequently,
addressing the intricate correlation among users, items, and bundles necessitates
a sophisticated approach capable of capturing both global and local nuances. We present
a novel framework named Large Language Model-based Multi-View and Multi-Grained Learning
for Bundle Recommendation (LLMCBR). We introduce an LLM-based semantic refinement
module to summarize and encode bundle-level knowledge. To bridge the gap between semantic
representation and collaborative signals, we design an adaptation strategy. Furthermore,
LLMCBR leverages multi-view and multi-granular modeling to unify collaborative signals.
Specifically, LLMCBR integrates item preferences within both bundle-view and item-view,
thereby augmenting the comprehensiveness of multi-view data. Following this integration,
each view undergoes stratification into multiple granularities to facilitate the acquisition
of multi-grained details. We introduce a multiple contrastive instance mechanism to
regulate the influence of different granularities and views. This mechanism empowers
the model to comprehend complex consumer behaviors across various dimensions. LLMCBR
is extensively evaluated over three real-world datasets, and the experimental results
demonstrate its superiority.
Masked Graph Distance Network for Accurate Subgraph Similarity Computation
- Xijuan Liu
- Yin Chen
- Fan Li
- Xiaoyang Wang
- Haiyang Hu
- Ying Zhang
Subgraph similarity search aims to identify target graphs in the database that approximately
contain the query graph, which is a fundamental problem in graph analysis. As a key
measure for subgraph similarity computation, Subgraph Edit Distance (SED) has garnered
significant research attention. Unfortunately, the exact computation of SED is an
NP-hard problem. In recent years, some studies have attempted to leverage Graph Neural
Networks (GNNs) to learn SED. However, existing GNN-based methods suffer from two
significant limitations: (1) They rely on node-centric message passing, which cannot
fully capture the impact of graph topology changes caused by graph edit operations.
(2) They struggle to handle the asymmetry of SED, making it challenging to balance
the scale differences between the input graphs and their distances in the representation
space. To address these issues, this paper proposes a novel Masked Graph Distance
Network (MGDN) for accurate SED approximation. First, MGDN utilizes a unified graph
encoder to perform message passing based on the original graph structure and its dual
hypergraph, effectively capturing the impact of node- and edge-specific edits. Then,
we introduce an adaptive graph masking module that flexibly assigns masking scores
to nodes and edges in the target graph to address asymmetry. Using multi-head masking,
we re-encode the input graphs to focus on substructures relevant to SED computation.
Finally, a multi-view predictor is employed at the graph level to approximate the
SED, enhancing estimation accuracy by integrating information from multiple perspectives.
Extensive experiments on nine benchmark datasets demonstrate that MGDN significantly
outperforms state-of-the-art methods.
Seeing Sequences like Humans: Pattern Classification Driven Time-Series Forecasting
via Vision Language Models
- Xingyu Liu
- Min Gao
- Zongwei Wang
- Yinbing Bai
Time-series forecasting is critical to highly data-dependent domains such as energy,
healthcare, and transportation. Although Large Language Models have recently been
explored for this task, their performance is hindered by a modality gap: numerical
sequences poorly align with text-based inputs, and direct alignment often introduces
noise. In contrast, human experts rarely predict directly from numbers; they first
inspect line charts to recognize overall patterns and then apply simple models for
forecasting. Inspired by this workflow, we propose VisMoE, a Vision-Language-Model-driven
Mixture-of-Experts framework. In VisMoE, Each sequence is transformed into a line-chart
image, enabling a VLM to classify it into distinct temporal regimes. Based on this
classification, VisMoE routes the sequence to lightweight specialized experts operating
alongside a global predictor, whose outputs are fused for final forecasts. This human-inspired
design preserves semantic understanding, reduces modality misalignment, and improves
computational efficiency. Extensive experiments across multiple benchmarks demonstrate
that VisMoE achieves state-of-the-art forecasting accuracy while remaining highly
efficient. Our code is available at https://github.com/Liu905169/VisMoE.
Enabling Group Fairness in Machine Unlearning via Distribution Correction
Machine unlearning is a recently developed technique to remove the influence of specific
data points from a trained model. However, most machine unlearning approaches focus
on preserving model performance, which may inadvertently introduce bias. From a preliminary
study, we found that a model can become more biased after applying unlearning algorithms.
To address this issue, we propose FMU (Fair Machine Unlearning), which ensures group
fairness throughout the unlearning process. Specifically, FMU first withdraws the
model updates for batches containing unlearning requests to protect privacy. It then
removes model updates from additional sampled batches that carry reversed sensitive
attributes linked to the same requests, mitigating newly introduced bias. Our experiments
compare FMU with standard machine unlearning baselines and one fair unlearning method.
Results show that FMU achieves superior fairness while maintaining privacy and delivering
accuracy comparable to full retraining. Furthermore, FMU remains effective across
diverse unlearning requests involving varying data distributions. Being orthogonal
to specific unlearning and debiasing techniques, FMU provides a flexible foundation
for more advanced fair machine unlearning research.
Enhancing Recommendation with Reliable Multi-profile Alignment and Collaborative-aware
Contrastive Learning
- Yibin Liu
- Jianyu Zhang
- Shijian Li
Recent studies have explored the integration of Large Language Models (LLMs) into
recommender systems to enhance the semantic understanding of users and items. While
traditional collaborative filtering approaches primarily rely on interaction histories,
LLM-enhanced methods attempt to construct comprehensive profiles by leveraging descriptive
metadata and user-generated reviews. The semantic representations of these profiles
are then aligned with recommender embeddings to enhance the performance of recommender
systems. However, the effectiveness of such approaches heavily depends on the quality
of the generated profiles, which face several critical challenges: inaccurate profiles,
insufficient information and information gap between semantic representations and
recommender embeddings. To tackle these challenges, we propose a novel framework with
reliable multi-profile alignment and collaborative-aware contrastive learning. Specifically,
we introduce a profile generation method combining Chain-of-Thought(CoT) prompting
and self-reflection to address the issue of inaccurate profiles. To alleviate the
problem of insufficient information, we introduce an interactive profile construction
mechanism that aggregates and summarizes common characteristics from users' and items'
neighbors in the user-item graph. To bridge the information gap between semantic representations
and recommender embeddings, we propose interactive information fusion(IIF), which
aggregates semantic representations from neighbors and employs supervised contrastive
learning to guide representation learning. Furthermore, we propose a multi-profile
alignment framework that aligns recommender embeddings with both basic profiles and
interactive profiles through deduplicated contrastive objectives, facilitating effective
semantic-behavioral alignment. Extensive experiments on three public datasets and
six base recommenders demonstrate that our method consistently outperforms strong
LLM-based baselines, achieving an average improvement of 2.93% in Recall@20 and 2.64%
in NDCG@20.
Structure-Attribute Transformations with Markov Chain Boost Graph Domain Adaptation
- Zhen Liu
- Yongtao Zhang
- Shaobo Ren
- Yuxin You
Graph domain adaptation has gained significant attention in label-scarce scenarios
across different graph domains. Traditional approaches to graph domain adaptation
primarily focus on transforming node attributes over raw graph structures and aligning
the distributions of the transformed node features across networks. However, these
methods often struggle with the underlying structural heterogeneity between distinct
graph domains, which leads to suboptimal distribution alignment. To address this limitation,
we propose Structure-Attribute Transformation with Markov Chain (SATMC), a novel framework
that sequentially aligns distributions across networks via both graph structure and
attribute transformations. To mitigate the negative influence of domain-private information
and further enhance the model's generalization, SATMC introduces a private domain
information reduction mechanism and an empirical Wasserstein distance. Theoretical
proofs suggest that SATMC can achieve a tighter error bound for cross-network node
classification compared to existing graph domain adaptation methods. Extensive experiments
on nine pairs of publicly available cross-domain datasets show that SATMC outperforms
state-of-the-art methods in the cross-network node classification task. The code is
available at https://github.com/GiantZhangYT/SATMC.
Chunked Data Shapley: A Scalable Dataset Quality Assessment for Machine Learning
- Andreas Loizou
- Dimitrios Tsoumakos
As the volume and diversity of available datasets continue to increase, assessing
data quality has become crucial for reliable and efficient Machine Learning analytics.
A modern, game-theoretic approach for evaluating data quality is the notion of Data
Shapley which quantifies the value of individual data points within a dataset. State-of-the-art
methods to scale the NP-hard Shapley computation also face severe challenges when
applied to large-scale datasets, limiting their practical use. In this work, we present
a Data Shapley approach to identify a dataset's high-quality data tuples, Chunked
Data Shapley (C-DaSh). C-DaSh scalably divides the dataset into manageable chunks
and estimates the contribution of each chunk using optimized subset selection and
single-iteration stochastic gradient descent. This approach drastically reduces computation
time while preserving high quality results. We empirically benchmark our method on
diverse real-world classification and regression tasks, demonstrating that C-DaSh
outperforms existing Shapley approximations in both computational efficiency (achieving
speedups between 80× - 2300×) and accuracy in detecting low-quality data regions.
Our method enables practical measurement of dataset quality on large tabular datasets,
supporting both classification and regression pipelines.
Harnessing Large Language Models for Group POI Recommendations
- Jing Long
- Liang Qu
- Junliang Yu
- Tong Chen
- Quoc Viet Hung Nguyen
- Hongzhi Yin
The rapid proliferation of Location-Based Social Networks (LBSNs) has underscored
the importance of Point-of-Interest (POI) recommendation systems in enhancing user
experiences. While individual POI recommendation methods leverage users' check-in
histories to provide personalized suggestions, they struggle to address scenarios
requiring group decision-making. Group POI recommendation systems aim to satisfy the
collective preferences of multiple users, but existing approaches face two major challenges:
diverse group preferences and extreme data sparsity in group check-in data. To overcome
these challenges, we propose LLMGPR, a novel framework that leverages large language
models (LLMs) for group POI recommendations. LLMGPR introduces semantic-enhanced POI
tokens and incorporates rich contextual information to model the diverse and complex
dynamics of group decision-making. To further enhance its capabilities, we developed
a sequencing adapter using Quantized Low-Rank Adaptation (QLoRA), which aligns LLMs
with group POI recommendation tasks. To address the issue of sparse group check-in
data, LLMGPR employs an aggregation adapter that integrates individual representations
into meaningful group representations. Additionally, a self-supervised learning (SSL)
task is designed to predict the purposes of check-in sequences (e.g., business trips
and family vacations), thereby enriching group representations with deeper semantic
insights. Extensive experiments demonstrate the effectiveness of LLMGPR, showcasing
its ability to significantly enhance the accuracy and robustness of group POI recommendations.
ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive
Text-to-Speech Generation
- Haowei Lou
- Hye-young Paik
- Wen Hu
- Lina Yao
Controlling speaking style in text-to-speech (TTS) systems has become a growing focus
in both academia and industry. While many existing approaches rely on reference audio
to guide style generation, such methods are often impractical due to privacy concerns
and limited accessibility. More recently, large language models (LLMs) have been used
to control speaking style through natural language prompts; however, their high computational
cost, lack of interpretability, and sensitivity to prompt phrasing limit their applicability
in real-time and resource-constrained environments. In this work, we propose ParaStyleTTS,
a lightweight and interpretable TTS framework that enables expressive style control
from text prompts alone. ParaStyleTTS features a novel two-level style adaptation
architecture that separates prosodic and paralinguistic speech style modeling. It
allows fine-grained and robust control over factors such as emotion, gender, and age.
Unlike LLM-based methods, ParaStyleTTS maintains consistent style realization across
varied prompt formulations and is well-suited for real-world applications, including
on-device and low-resource deployment. Experimental results show that ParaStyleTTS
generates high-quality speech with performance comparable to state-of-the-art LLM-based
systems while being 30x faster, using 8x fewer parameters, and requiring 2.5x less
CUDA memory. Moreover, ParaStyleTTS exhibits superior robustness and controllability
over paralinguistic speaking styles, providing a practical and efficient solution
for style-controllable text-to-speech generation. Demo can be found at https://parastyletts.github.io/ParaStyleTTS_Demo/.
Code can be found at https://github.com/haoweilou/ParaStyleTTS.
Dual Denoising Diffusion Model for Session-based Social Recommendation
- Mengying Lu
- Hai-Tao Zheng
- Lan Zhou
- Qi Li
- Jinxiao Shan
- Zhixing Li
- Hong-Gee Kim
Session-based Social Recommendation (SSR) enhances item recommendations by incorporating
both session interactions and social network data. Despite recent progress, existing
SSR methods-primarily based on Graph Neural Networks-are highly susceptible to session
noise (irrelevant or unintentional interactions) and social noise (misleading signals
from connected users). Prior denoising strategies often rely on heuristic resampling
or reweighting techniques, which lack generalizability and robustness across diverse
datasets. In this work, we explore a novel direction by introducing diffusion models
for denoising in SSR. However, applying diffusion to SSR presents unique challenges
due to heterogeneous data modalities, incompatible noise patterns, and the absence
of semantic guidance during the reverse process. To overcome these challenges, we
propose D3MRec, a Dual Denoising Diffusion Model specifically designed for SSR. D3MRec employs a dual-branch architecture that independently models session sequences
and social graphs, applying denoising diffusion in their respective hidden representation
spaces. This decoupled design preserves the structural integrity of each modality
while enabling modality-specific denoising. Moreover, we introduce cross-modal guidance
by leveraging collaborative signals from the other branch during the reverse diffusion
process, enhancing alignment between session intents and social preferences. The dual
denoising processes not only mitigate noise within each modality but also serve as
mutual priors, facilitating robust and consistent representation learning across modalities.
Extensive experiments on multiple benchmarks show that D3MRec significantly outperforms state-of-the-art models, particularly under noisy conditions,
demonstrating its effectiveness and robustness.
Collaborative Interest Mining Network for Knowledge Graph-based Recommendation
- Jie Luo
- Ying Pan
- Guoliang Huang
Knowledge graphs contain rich semantic information and have been widely applied in
recommender systems. However, most existing knowledge graph-based recommendation methods
primarily focus on modeling item-side semantics and overlook the critical role of
knowledge graphs in enhancing user representations. In this work, we propose a novel
recommendation model called Collaborative Interest Mining Network for Knowledge Graph-based
Recommendation (CIMNK), which leverages the knowledge graph to mine collaborative
interest similarity (i.e., the similarity between users who share the same interests),
thereby enhancing the quality of user embeddings. Specifically, CIMNK first constructs
a user-interest graph by performing fine-grained filtering over entities and distinguishing
between different relation types, to explicitly represent direct associations between
users and interest entities. Subsequently, CIMNK introduces the Relation-aware Collaborative
Interest Mining Module (RCIM), which conducts graph representation learning on the
user-interest graph to mine and integrate collaborative interest information across
different relation types. Finally, we design an interest-aware loss function to supervise
the learning of collaborative interest similarity. Extensive experiments on three
public benchmark datasets demonstrate that CIMNK outperforms state-of-the-art methods.
The implementations are available at: https://github.com/JieLuoRoger/CIMNK-Pytorch.
LEI: Reinforced Multi-Object Cache Admission
- Hexuan Lv
- Yuhai Zhao
- Sice Wang
The Hot Object Cache (HOC) admission policy is one of the core technologies in Content
Delivery Network (CDN) cache management and plays a critical role in the broader Computing
Power Network (CPN) environment. It primarily employs heuristic- and threshold-based
methods. These methods are simple and efficient but fail both to capture latent dependencies
among requested objects and to support informed admission decisions. We propose a
reinforcement learning approach called LEI for multi-object CDN admission (i.e., sequences
of request objects across multiple time instants, which include multiple objects requested
at each instant). LEI integrates three key components: a Multi-Object Time Encoding
(MO-TE) mechanism, which uses density-based representations to model multi-object
temporal sequences; a Buffer Mechanism, which addresses the reward-latency issue by
buffering historical decision information; and a dual-head network architecture with
a two-stage training strategy, which enhances the model's admission decision-making
capability and long-term stability. Experiments on four public datasets demonstrate
that LEI improves the HOC request hit rate by 5-12% and the HOC byte hit rate by 5-42%
compared with state-of-the-art methods. It also reduces disk cache (DC) read and write
rates.
MARM: Unlocking the Recommendation Cache Scaling-Law through Memory Augmentation and
Scalable Complexity
- Xiao Lv
- Jiangxia Cao
- Shijie Guan
- Xiaoyou Zhou
- Zhiguang Qi
- Yaqiang Zang
- Ben Wang
- Guorui Zhou
Scaling-law has guided the language model design for past years, e.g., GPTs, enabling
the estimation of expected model performance with respect to the size of learnable
parameters and the scale of training samples. It is worth noting that the scaling
laws of NLP cannot be directly applied to recommendation systems due to the following
reasons: (1) The amount of training samples and model parameters is typically not
the bottleneck for the model. Our recommendation system can generate over 50 billion
user samples daily, and such a massive amount of training data can easily allow our
model parameters to exceed 200 billion, surpassing many LLMs (about 100B). (2) It
is essential to control FLOPs carefully in recommendation system. In training, we
need to process a vast number of recommendation samples every day. During online inference,
we must respond within milliseconds (LLMs usually take a few seconds). Considering
the above differences with LLM, we can conclude that: for a RecSys model, compared
to model parameters, the FLOPs is a more expensive factor that requires careful control.
In this paper, we propose our milestone work, MARM (Memory Augmented Recommendation
Model), which explores a new cache scaling-law successfully. By caching part of complex
module calculation results, our MARM extends the single-layer attention-based sequences
interests modeling module to a multi-layer setting with minor inference FLOPs cost
(i.e, module time complexity O(n2*d) -> O(n*d)). Equipped with the cache idea, our MARM solution significantly overcomes
computational bottlenecks and can seamlessly empower all interest extraction modules
for user sequences, and even other models. To support our MARM, we construct a 60TB
cache storage center for offline training and online serving. Comprehensive experiment
results show that our MARM brings offline 0.43% GAUC improvements and online 2.079%
play-time per user gains. Our MARM has been deployed on a real-world short-video platform,
serving tens of millions of users daily.
MetaCAN: Improving Generalizability of Few-shot Anomaly Detection with Meta-learning
- Zhisheng Lv
- Jianfeng Zhang
- Songlei Jian
- Chenlin Huang
- Hongguang Zhang
- Guansong Pang
- Zhong Liu
Few-shot Anomaly Detection (AD) for images aims to detect anomalies with few-shot
normal samples from the target dataset. It is a crucial task when only few samples
can be obtained, and it is challenging since it needs to be generalized to different
domains. Existing methods try to enhance the generalizability of AD by incorporating
large vision-language models (LVLMs).However, how to transform category semantic information
in LVLMs into anomaly information to improve the generalizability of AD remains a
challenge facing existing methods.To address the challenge, we propose a few-shot
AD method called MetaCAN, a novel category-to-anomaly network trained with AD meta-learning
scheme based on an LVLM. Specifically, MetaCAN constructs the auxiliary training data
and multiple tasks based on different categories to perform AD meta-learning, which
ensures that the optimization toward the achievement of optimal anomaly detection
across all categories. Moreover, MetaCAN introduces an image-image anomaly discriminator
and an image-text anomaly detector to fully exploit the powerful multimodal semantic
representations during auxiliary training. Once trained on auxiliary datasets, MetaCAN
can be applied directly to other target datasets without retraining. Extensive experiments
on six real-world datasets demonstrate that MetaCAN achieves state-of-the-art performance
on cross-domain and cross-category anomaly detection tasks compared with existing
methods.
Multimodal Sentiment Analysis with Multi-Perspective Thinking via Large Multimodal
Models
- Juhao Ma
- Shuai Xu
- Yicong Li
- Xiaoming Fu
Multimodal sentiment analysis (MSA) is attracting increasing attention from researchers.
Existing studies on MSA typically rely on surface-level feature extraction and fusion
that can be directly obtained from multimodal data, which may often ignore the underlying
semantic connection between images and texts. Recent progress in large multimodal
models (LMMs) has demonstrated their impressive reasoning abilities, which can be
leveraged to improve traditional MSA approaches by providing a deeper understanding
of the sematic connection of the modalities. Toward this issue, in this paper, we
propose a novel framework called MPT that combines traditional MSA approaches with Multi-Perspective Thinking from LMMs to improve prediction outcomes. Specifically, MPT instructs the
traditional multimodal deep learning models to understand multiple-perspective rationales
for different sentiment polarities, augmenting its knowledge base and enhancing its
ability to make more accurate predictions. Extensive experiments on four refined datasets
show that MPT can not only deliver better performance compared with existing methods,
but also demonstrate good cross-modal understanding ability for recognizing user sentiment.
The codes and datasets can be accessed here: https://github.com/RMJHQwQ/MPT.
Reconsidering the Performance of GAE in Link Prediction
- Weishuo Ma
- Yanbo Wang
- Xiyuan Wang
- Muhan Zhang
Recent advancements in graph neural networks (GNNs) for link prediction have introduced
sophisticated training techniques and model architectures. However, reliance on outdated
baselines may exaggerate the benefits of these new approaches. To tackle this issue,
we systematically explore Graph Autoencoders (GAEs) by applying model-agnostic tricks
in recent methods and tuning hyperparameters. We find that a well-tuned GAE can match
the performance of recent sophisticated models while offering superior computational
efficiency on widely used link prediction benchmarks. Our approach delivers substantial
performance gains on datasets where structural information dominates and feature data
is limited. Specifically, our GAE achieves a state-of-the-art (SOTA) Hits@100 score
of 78.41% on the ogbl-ppa dataset. Furthermore, we examine the impact of various tricks
to uncover the reasons behind our success and to guide the design of future methods.
Our study emphasizes the critical need to update baselines for a more accurate assessment
of progress in GNNs for link prediction. Our code is available at https://github.com/GraphPKU/Refined-GAE.
As Good as It KAN Get: High-Fidelity Audio Representation
- Patryk Marszałek
- Maciej Rut
- Piotr Kawa
- Przemysław Spurek
- Piotr Syga
Implicit neural representations (INR) have gained prominence for efficiently encoding
multimedia data, yet their applications in audio signals remain limited. This study
introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable
activation functions, as an effective INR model for audio representation. KAN demonstrates
superior perceptual performance over previous INRs, achieving the lowest Log-Spectral
Distance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for
1.5~s audio. To extend KAN's utility, we propose FewSound, a hypernetwork-based architecture
that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound,
with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a
robust and adaptable audio representation with the potential for scalability and integration
into various hypernetwork frameworks.
Tight Bounds for Jensen's Gap with Applications to Variational Inference
- Marcin Mazur
- Tadeusz Dziarmaga
- Piotr Kościelniak
- Łukasz Struski
Since its original formulation, Jensen's inequality has played a fundamental role
across mathematics, statistics, and machine learning, with its probabilistic version
highlighting the nonnegativity of the so-called Jensen's gap, i.e., the difference
between the expectation of a convex function and the function at the expectation.
Of particular importance is the case when the function is logarithmic, as this setting
underpins many applications in variational inference, where the term variational gap
is often used interchangeably. Recent research has focused on estimating the size
of Jensen's gap and establishing tight lower and upper bounds under various assumptions
on the underlying function and distribution, driven by practical challenges such as
the intractability of log-likelihood in graphical models like variational autoencoders
(VAEs). In this paper, we propose new, general bounds for Jensen's gap that accommodate
a broad range of assumptions on both the function and the random variable, with special
attention to exponential and logarithmic cases. We provide both analytical and empirical
evidence for the performance of our method. Furthermore, we relate our bounds to the
PAC-Bayes framework, providing new insights into generalization performance in probabilistic
models.
FinD3: A Dual 3D State Space Model with Dynamic Hypergraph for Financial Stock Prediction
- Jieyuan Mei
- Jindong Tian
- Ronghui Xu
- Hanyue Wei
- Chenjuan Guo
- Bin Yang
The financial market plays a crucial role in the modern economy by influencing capital
allocation, corporate valuation, and investor behavior. However, its complex dependencies
and non-stationary dynamics present significant challenges for financial stock prediction.
Previous predictive approaches are typically categorized into Univariate Time Series
(UTS) and Multivariate Time Series (MTS) paradigms. UTS methods overlook both cross-feature
and cross-stock influences, while MTS methods can only capture one of these simultaneously.
Although some recent approaches claim to model 3D Multivariate Time Series (3D-MTS)
dependencies, they often discard substantial information and fail to capture the dynamics
of the stock market. To address these limitations, we propose FinD3, a Financial 3D model using Dual cubic state spaces and Dynamic hypergraphs. To extract
the inherent complex relationships in 3D-MTS, we propose a novel Dual Cubic State
Space Model (DCSSM) to capture both cross-feature and cross-stock patterns. Furthermore,
to more accurately reflect the dynamics of the stock market, we present an Evolving
Hypergraph Attention (EHA) module, which captures dynamic changes in financial markets
and updates the hypergraph based on a priori hypergraph. Experimental results demonstrate
that FinD3 achieves state-of-the-art performance in quantitative trading performance on two
real-world stock market datasets, offering a promising solution to practical quantitative
trading challenges. The code is available at: https://github.com/decisionintelligence/FinD3.
Dense Retrieval for Aggregated Search
- Lang Mei
- Sijie Liu
- Ziyuan Zhao
- Rolan Yan
- Jiaxin Mao
- Ji-rong Wen
To satisfy users' diverse information needs, the aggregated search systems need to
integrate heterogeneous results, with rich but different structural information, from
a variety of verticals, such as news search, video search, and product search. A key
challenge in aggregated search is to effectively and efficiently retrieve the most
relevant results among a large number of heterogeneous information from different
verticals. With the development of deep learning and pre-trained language models (PLMs),
many researchers resort to Dense Retrieval (DR) models for a unified, efficient embedding-based
retrieval and a better retrieval performance. However, existing dense retrieval models
have limitations in: 1) capturing the structural information of search results ; and 2) generalizing across different vertical domains where the search results have different
or even unseen structures. In this paper, we aim to tackle these limitations, and propose an effective and efficient
dense retrieval model for aggregated search. Specifically, we utilize a deep prompt-tuning
technique to make the pre-training model easily applied to downstream vertical search
tasks. To capture the structural knowledge, we design a Graph Neural Network (GNN)-based
structure prompt, to prompt how text segments are organized in the vanilla semi-structured
data. We further incorporate a distributional prompt to model the theme of each domain,
and enhance cross-domain generalization. Extensive experiments on the real-world data
collected from the WeChat Search demonstrate that for aggregated search tasks, our
models can achieve better performance over existing retrieval models, and have the
superior ability to generalize to the various or even unseen vertical search tasks.
Usefulness and Diminishing Returns: Evaluating Social Information in Recommender Systems
- Qing Meng
- Huiyu Min
- Ming Shan Hee
- Roy Ka-Wei Lee
- Bing Tian Dai
- Shuai Xu
Social recommendation, which leverages users' social information to predict users'
preferences, is a popular branch of recommender systems. Many existing studies have
attempted to advance the performance of collaborative filtering methods by leveraging
the user-user matrix to enhance user embedding learning with user's social connections.
While the existing social recommender systems have demonstrated good performance in
various recommendation tasks, the extent of social information usefulness in recommender
systems remains unclear. This paper addresses the research gap by designing experiments
to answer three research questions: (i) How useful is social information in varying
user-item data sparsity? (ii) How much social information do the existing social recommendation
models use? (iii) How valuable is social information for cold-start situations? Working
towards answering the research questions, we introduce evaluation metrics to estimate
the utilization of social information in the existing social recommendation models.
We conducted experiments on three publicly available social recommendation datasets,
and our results showed that there are diminishing returns when applying social information
in recommender systems.
A Cost-Effective Framework to Evaluate LLM-Generated Relevance Judgements
- Simone Merlo
- Stefano Marchesin
- Guglielmo Faggioli
- Nicola Ferro
Large Language Models (LLMs) hugely impacted many research fields, including Information
Retrieval (IR), where they are used for many sub-tasks, such as query rewriting and
retrieval augmented generation. At the same time, the research community is investigating
whether and how to use LLMs to support, or even replace, humans to generate relevance
judgments. Indeed, generating relevance judgements automatically - or integrating
an LLM in the annotation process - would allow us to improve the number of evaluation
collections, also for scenarios where the annotation process is particularly challenging.
To validate relevance judgements produced by an LLM they are compared with human-made
relevance judgements, measuring the inter-assessor agreement between the human and
the LLM.
Our work introduces an innovative framework for estimating the quality of LLM-generated
relevance judgments, providing statistical guarantees while minimizing human involvement.
The proposed framework allows to: i) estimate the quality of LLM-generated relevance
judgments with a defined confidence while minimizing human involvement; and ii) estimate
the quality of LLM-generated relevance judgments with a fixed budget while providing
bounds on the estimate. Our experimental results on three well-known IR collections
using multiple LLMs as assessors show it is sufficient to assess 16% of the LLM-generated
relevance judgments to estimate the LLM's performance with a 95% confidence.
Reverse Chain-of-Thought and Causal Path Verification: A Modular Plugin for Aligning
LLMs with Knowledge Graphs
- Dezhuang Miao
- Yibin Du
- Xiang Li
- Xiaoming Zhang
- Jiahe Li
- Bo Zhang
- Bingyu Yan
- Lian Zhang
- Litian Zhang
Large language models (LLMs) exhibit strong language understanding capabilities, but
encounter challenges when integrating structured knowledge from knowledge graphs (KGs)
for complex reasoning tasks such as knowledge graph question answering (KGQA). Existing
methods often rely on prompt engineering or fixed templates, which obscure the relational
structure and limit generalization. To address these limitations, this paper introduces
the Reverse Chain-of-Thought (R-CoT) and Causal Path Verification Plugin, a modular framework that reconstructs retrieved KG triples into reverse chains
of sub-questions. Each reasoning step is aligned with a supporting triple, forming
interpretable multi-hop paths. In particular, Semantic Causal Scoring (SCS) module
is further incorporated to evaluate the causal alignment between each reverse sub-question
and the original question through dynamic semantic vector matching. The SCS design
avoids frequent interactions with LLMs and effectively filters irrelevant or unsupported
reasoning steps. Based on the scoring results, a template-free, model-agnostic R-CoT
input format is constructed as a semi-structured sequence. This design preserves the
KG structure in natural language form and enables seamless integration with standard
LLMs without fine-tuning. Experimental results demonstrate that the R-CoT Plugin consistently
improves factual alignment, enhances reasoning stability, and outperforms conventional
prompt-based methods in both accuracy and coherence.
Towards Adaptive Personalized Conversational Information Retrieval
- Fengran Mo
- Yuchen Hui
- Yuxing Tian
- Zhaoxuan Tan
- Chuan Meng
- Zhan Su
- Kaiyu Huang
- Jian-Yun Nie
Personalized conversational information retrieval (CIR) systems aim to satisfy users'
complex information needs through multi-turn interactions by considering user profiles.
However, not all search queries require personalization. The challenge lies in appropriately
incorporating personalization elements into search when needed. Most existing studies
implicitly incorporate users' personal information and conversational context using
large language models without distinguishing the specific requirements for each query
turn. Such a ''one-size-fits-all'' personalization strategy might lead to sub-optimal
results. In this paper, we propose an adaptive personalization method, in which we
first identify the required personalization level for a query and integrate personalized
queries with other query reformulations to produce various enhanced queries. Then,
we design a personalization-aware ranking fusion approach to assign fusion weights
dynamically to different reformulated queries, depending on the required personalization
level. The proposed Adaptive Personalized Conversational Information Retrieval framework
APCIR is evaluated on two TREC iKAT datasets. The results confirm the effectiveness of
adaptive personalization of APCIR by outperforming state-of-the-art methods.
EFU: Enforcing Federated Unlearning via Functional Encryption
- Samaneh Mohammadi
- Vasileios Tsouvalas
- Iraklis Symeonidis
- Ali Balador
- Tanir Ozcelebi
- Francesco Flammini
- Nirvana Meratnia
Federated unlearning (FU) algorithms allow clients in federated settings to exercise
their right to be forgotten by removing the influence of their data from a collaboratively trained model. Existing
FU methods maintain data privacy by performing unlearning locally on the client-side
and sending targeted updates to the server without exposing forgotten data; yet they
often rely on server-side cooperation, revealing the client's intent and identity
without enforcement guarantees - compromising autonomy and unlearning privacy. In
this work, we propose EFU (Enforced Federated Unlearning), a cryptographically enforced FU framework that enables clients to initiate
unlearning while concealing its occurrence from the server. Specifically, EFU leverages
functional encryption to bind encrypted updates to specific aggregation functions,
ensuring the server can neither perform unauthorized computations nor detect or skip
unlearning requests. To further mask behavioral and parameter shifts in the aggregated
model, we incorporate auxiliary unlearning losses based on adversarial examples and
parameter importance regularization. Extensive experiments show that EFU achieves
near-random accuracy on forgotten data while maintaining performance comparable to
full retraining across datasets and neural architectures - all while concealing unlearning
intent from the server. Furthermore, we demonstrate that EFU is agnostic to the underlying
unlearning algorithm, enabling secure, function-hiding, and verifiable unlearning
for any client-side FU mechanism that issues targeted updates.
Learning Optimal Personalised Reservation Prices in Impression Ad Auctions with Mixture
Density Networks
- Dmitrii Moor
- Emma Zetterdahl
- Paul van Vliet
- Zhenwen Dai
- Mounia Lalmas
Reservation prices have proven effective in boosting revenue in Generalised Second
Price (GSP) auctions, particularly in cost-per-click (CPC) settings. However, in domains
like music streaming, where ads are consumed passively without user clicks, a cost-per-impression
(CPM) model is more appropriate. Additionally, in the music streaming domain, user
intent is typically unknown, unlike in sponsored search, making it essential to optimally
leverage all available user and contextual information when setting prices. This paper
addresses the challenge of optimising reservation prices in GSP auctions with CPM
pricing, adopting a personalised approach that accounts for both user- and advertiser-specific
factors.
Using dataset of 100,000 auctions from a major music streaming service, we determine
such optimal prices. To achieve this, we first derive the symmetric Nash equilibrium
for GSP auctions in a CPM context. We then introduce a Deep Neural Network-based mixture
density model that incorporates this equilibrium into its loss. This model captures
advertisers' diverse preferences by learning directly from bidding data. We show how
this approach enables the computation of personalised prices for both users and advertisers,
boosting auction revenue by an average of +4% across ten markets. Our study further
highlights the impact of market competitiveness and advertiser preference heterogeneity
on revenue gains, showing that personalised pricing greatly enhances auction performance.
Improving Text Embedding Models with Positive-aware Hard-negative Mining
- Gabriel de Souza P. Moreira
- Radek Osmulski
- Mengyao Xu
- Ronay Ak
- Benedikt Schifferer
- Even Oldridge
Text embedding models have been popular for information retrieval applications such
as semantic search and Question-Answering systems based on Retrieval-Augmented Generation
(RAG). Those models are typically Transformer models that are fine-tuned with contrastive
learning objectives. One of the challenging aspects of fine-tuning embedding models
is selecting high quality hard-negative passages for contrastive learning. In this
paper we introduce a family of positive-aware mining methods that use the positive
relevance score as an anchor for false negative removal. Our methods are simple, effective,
scalable, and lead to faster training and more accurate retrieval models. We provide
an ablation study on hard-negative mining methods over their configurations, exploring
different teacher and base models. We further demonstrate the efficacy of our proposed
mining methods at scale with the NV-Retriever-v1 model, which scored 60.9 on the MTEB
Retrieval (BEIR) benchmark and placed 1st upon its publication.
Latent Variable Modeling for Robust Causal Effect Estimation
- Tetsuro Morimura
- Tatsushi Oka
- Yugo Suzuki
- Daisuke Moriwaki
Latent variable models provide a powerful framework for incorporating and inferring
unobserved factors in observational data. In causal inference, they help account for
hidden factors influencing treatment or outcome, thereby addressing challenges posed
by missing or unmeasured covariates. This paper proposes a new framework that integrates
latent variable modeling into the double machine learning (DML) paradigm to enable
robust causal effect estimation in the presence of such hidden factors. We consider
two scenarios: one where a latent variable affects only the outcome, and another where
it may influence both treatment and outcome. To ensure tractability, we incorporate
latent variables only in the second stage of DML, separating representation learning
from latent inference. We demonstrate the robustness and effectiveness of our method
through extensive experiments on both synthetic and real-world datasets.
Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
Robustness is a critical requirement for deploying machine learning models in safety-sensitive
domains, where even imperceptible input perturbations can lead to hazardous outcomes.
However, existing robustness assessment techniques prior to deployment often face
a trade-off between computational feasibility and measurement precision, limiting
their effectiveness in practice. To address these limitations, we provide a systematic
comparative study of prevailing robustness definitions and their corresponding evaluation
methodologies. Building on this analysis, we propose tower robustness, which is a
novel and practical concept setting out from a global perspective. Further, we provide
upper and lower bounds of tower robustness, based on hypothesis testing, for quantitative
evaluation, enabling more rigorous and efficient pre-deployment assessments. Through
empirical investigation, we demonstrate that our approach provides reliable robustness
assessments. These findings advance the systematic understanding of robustness and
contribute a practical framework for enhancing the safety of machine learning models
in safety-critical applications.
Multi-Ontology Integration with Dual-Axis Propagation for Medical Concept Representation
- Mohsen Nayebi Kerdabadi
- Arya Hadizadeh Moghaddam
- Dongjie Wang
- Zijun Yao
Medical ontology graphs map external knowledge to medical codes in electronic health
records (EHRs) via structured relationships. By leveraging domain-approved connections
(e.g., parent-child), predictive models can generate richer medical concept representations
by incorporating contextual information from related concepts. However, existing literature
primarily focuses on incorporating domain knowledge from a single ontology system,
or from multiple ontology systems (e.g., diseases, drugs, and procedures) in isolation,
without integrating them into a unified learning structure. Consequently, concept
representation learning often remains limited to intra-ontology relationships, overlooking
cross-ontology connections that could enhance the richness of healthcare representations.
In this paper, we propose LINKO, a large language model (LLM)-augmented integrative
ontology learning framework that leverages multiple ontology graphs simultaneously
by enabling dual-axis knowledge propagation both within and across heterogeneous ontology
systems to enhance medical concept representation learning. Specifically, LINKO first
employs LLMs to provide a graph-retrieval-augmented initialization for ontology concept
embedding, through an engineered prompt that includes concept descriptions, and is
further augmented with ontology graph relations and task-specific details. Second,
our method jointly learns the medical concepts in diverse ontology graphs by performing
knowledge propagation in two axes: (1) intra-ontology vertical propagation across
hierarchical ontology levels and (2) inter-ontology horizontal propagation within
every level in parallel. Last, through extensive experiments on two public datasets,
we validate the superior performance of LINKO over state-of-the-art baselines. As
a plug-in encoder compatible with existing EHR predictive models, LINKO further demonstrates
enhanced robustness in scenarios involving limited data availability and rare disease
prediction.
PriviRec: Confidential and Decentralized Graph Filtering for Recommender Systems
- Julien Nicolas
- César Sabater
- Mohamed Maouche
- Mark Coates
- Sonia Ben Mokhtar
Recent advances in recommender systems have shown that relying on graph filters, such
as the normalized item-item adjacency matrix and the ideal low-pass filter yields
competitive performance and scales better than Graph Convolutional Networks-based
solutions. However, these solutions require centralizing user data, which raises concerns
over data privacy, security, and the monopolization of user data by a few actors.
To address those concerns, we propose PriviRec and PriviRec-k, two complementary recommendation frameworks. In PriviRec, we show that it is possible
to decompose widely used filters so that they can be computed in a distributed setting
using Secure Aggregation and a distributed version of the Randomized Power Method,
without revealing individual users contributions. PriviRec-k extends this approach by having users securely aggregate low-rank projections of
their contributions, enabling a tunable balance between communication overhead and
recommendation accuracy. We demonstrate theoretically as well as experimentally on
Gowalla, Yelp2018, and Amazon-Book that our methods achieve performance comparable
to centralized state-of-the-art recommender systems and superior to decentralized
ones, while preserving confidentiality and low communication and computational overheads.
Addressing Personalized Bias for Unbiased Learning to Rank
- Zechun Niu
- Lang Mei
- Liu Yang
- Ziyuan Zhao
- Qiang Yan
- Jiaxin Mao
- Ji-Rong Wen
Unbiased learning to rank (ULTR), which aims to learn unbiased ranking models from
biased user behavior logs, plays an important role in Web search. Previous research
on ULTR has studied a variety of biases in users' clicks, such as position bias, presentation
bias, and outlier bias. However, existing work often assumes that the behavior logs
are collected from an ''average'' user, neglecting the differences between different
users in their search and browsing behaviors. In this paper, we introduce personalized
factors into the ULTR framework, which we term the user-aware ULTR problem. Through
a formal causal analysis of this problem, we demonstrate that existing user-oblivious
methods are biased when different users have different preferences over queries and
personalized propensities of examining documents. To address such a personalized bias,
we propose a novel user-aware inverse-propensity-score estimator for learning-to-rank
objectives. Specifically, our approach models the distribution of user browsing behaviors
for each query and aggregates user-weighted examination probabilities to determine
propensities. We theoretically prove that the user-aware estimator is unbiased under
some mild assumptions and shows lower variance compared to the straightforward way
of calculating a user-dependent propensity for each impression. Finally, we empirically
verify the effectiveness of our user-aware estimator by conducting extensive experiments
on two semi-synthetic datasets and a real-world dataset.
TANDEM: Temporal Attention-guided Neural Differential Equations for Missingness in
Time Series Classification
- Yongkyung Oh
- Dongyoung Lim
- Sungil Kim
- Alex A.T. Bui
Handling missing data in time series classification remains a significant challenge
in various domains. Traditional methods often rely on imputation, which may introduce
bias or fail to capture the underlying temporal dynamics. In this paper, we propose
TANDEM (Temporal Attention-guided Neural Differential Equations for Missingness),
an attention-guided neural differential equation framework that effectively classifies
time series data with missing values. Our approach integrates raw observation, interpolated
control path, and continuous latent dynamics through a novel attention mechanism,
allowing the model to focus on the most informative aspects of the data. We evaluate
TANDEM on 30 benchmark datasets and a real-world medical dataset, demonstrating its
superiority over existing state-of-the-art methods. Our framework not only improves
classification accuracy but also provides insights into the handling of missing data,
making it a valuable tool in practice.
Disentangling Complex Questions in LLMs via Multi-Hop Dependency Graphs
- Roland Oruche
- Alphaeus Dmonte
- Vani Seth
- Zian Zeng
- Yuanxun Zhang
- Marcos Zampieri
- Prasad Calyam
While Large language models (LLMs) have shown to exhibit remarkable performance in
a wide range of NLP tasks, they often struggle to interpret and reason over multi-hop
questions in open-domain question answering (ODQA) settings. While popular prompt
approaches such as Chain-of-Thought and Plan-and-Solve facilitate more manageable
questions for OQDA via task decomposition, these approaches are prone to generating
erroneous and redundant intermediate steps in multi-hop queries due to limited capacity
for modeling complex entity relationships. In this paper, we introduce a novel prompt
approach for multi-hop QA viz., MoDeGraph (Multi-Hop Dependency Graphs), that is designed to steer LLMs to extract and model
entity relationships in complex questions. MoDeGraph constructs a dependency graph
from LLM-generated entity-relation triples to enable more coherent and human-like
multi-step reasoning. Experimental results in knowledge-intensive tasks for multi-hop
QA demonstrate our approach produces more coherent and faithful reasoning chains as
well as consistent increase in QA performance across several benchmark datasets.
Multimodal Sentiment Analysis via Progressive Fusion of Audio-Visual Affective Descriptions
Multimodal Sentiment Analysis (MSA) holds significant research value in the fields
of intelligent human-computer interaction and affective computing. Although existing
MSA approaches have made considerable progress, challenges remain in identifying subtle
emotional distinctions within audio and visual expressions. In particular, conventional
fusion methods have not effectively addressed the difficulty of integrating heterogeneous
modality information. To tackle these challenges, we propose a progressive fusion
framework based on audio-visual affective descriptions for MSA. Specifically, we design
an audio-visual emotional description generator that transforms raw audiovisual data
into textual emotional descriptions, thereby effectively highlighting affective features.
Subsequently, this emotional description is integrated with the original multimodal
features to obtain a richer feature representation. Building upon this, we introduce
a three-stage progressive fusion architecture. First, we employ cross-modal transformers
to facilitate interactions among modalities and to learn inter-modal dependencies.
Second, a gated fusion mechanism is incorporated to effectively eliminate redundant
information and further promote interaction and compatibility among features. Finally,
an attention mechanism is utilized to dynamically adjust the weights of features from
different modalities, enabling effective multimodal sentiment information fusion.
Experimental results on widely used sentiment analysis benchmark datasets, including
MOSI, MOSEI, and CH-SIMS, underscore significant enhancements compared to state-of-the-art
models.
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable
Mental Disorder Diagnosis
- Mithat Can Ozgun
- Jiahuan Pei
- Koen Hindriks
- Lucia Donatelli
- Qingzhi Liu
- Junxiao Wang
LLM-based agents have emerged as transformative tools capable of executing complex
tasks through iterative planning and action, achieving significant advancements in
understanding and addressing user needs. Yet, their effectiveness remains limited
in specialized domains such as mental health diagnosis, where they underperform compared
to general applications. Current approaches to integrating diagnostic capabilities
into LLMs rely on scarce, highly sensitive mental health datasets, which are challenging
to acquire. These methods also fail to emulate clinicians' proactive inquiry skills,
lack multi-turn conversational comprehension, and struggle to align outputs with expert
clinical reasoning. To address these gaps, we propose DSM5AgentFlow, the first LLM-based
agent workflow designed to autonomously generate DSM-5 Level-1 diagnostic questionnaires.
By simulating therapist-client dialogues with specific client profiles, the framework
delivers transparent, step-by-step disorder predictions, producing explainable and
trustworthy results. This workflow serves as a complementary tool for mental health
diagnosis, ensuring adherence to ethical and legal standards. Through comprehensive
experiments, we evaluate leading LLMs across three critical dimensions: conversational
realism, diagnostic accuracy, and explainability. Our datasets and implementations
are fully open-sourced.
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large
Language Models
- Dayan Pan
- Zhaoyang Fu
- Jingyuan Wang
- Xiao Han
- Yue Zhu
- Xiangyu Zhao
Large Language Models (LLMs) possess remarkable generalization capabilities but struggle
with multi-task adaptation, particularly in balancing knowledge retention with task-specific
specialization. Conventional fine-tuning methods suffer from catastrophic forgetting
and substantial resource consumption, while existing parameter-efficient methods perform
suboptimally in complex multi-task scenarios. To address this, we propose Contextual
Attention Modulation (CAM), a novel mechanism that dynamically modulates the representations
of self-attention modules in LLMs. CAM enhances task-specific features while preserving
general knowledge, thereby facilitating more effective and efficient adaptation. For
effective multi-task adaptation, CAM is integrated into our Hybrid Contextual Attention
Modulation (HyCAM) framework, which combines a shared, full-parameter CAM module with
multiple specialized, lightweight CAM modules, enhanced by a dynamic routing strategy
for adaptive knowledge fusion. Extensive experiments on heterogeneous tasks, including
question answering, code generation, and logical reasoning, demonstrate that our approach
significantly outperforms existing approaches, achieving an average performance improvement
of 3.65%. The implemented code and data are available to ease reproducibility.
MRCLQR: A Framework for Logical Query Reasoning Based on Multi-information Relation
Constraints
- Pengwei Pan
- Yu Liu
- Jun Ma
- Jianfeng Qu
- Wen Hua
- Yanmei Kang
The Knowledge Graph logical reasoning task faces a dual challenge of insufficient
semantic coverage from type information and missing structural information from relations.
Although type annotations provide semantic priors for entities, their coarse-grained
features cannot comprehensively characterize entity attributes; conversely, relational
structure can enhance semantic representation, but the incompleteness of edges in
real-world graphs limits modeling when relying on a single information source. To
address these issues, we propose MRCLQR (Multi-information Relation Constraint-based
Logical Query Reasoning), a framework with three core innovations: (1) an Information
Semantic Alignment module based on contrastive learning, which achieves cross-modal
semantic collaboration via entity-type-structure pairing; (2) a Constraint-aware Relation
Encoding method that decomposes relation semantics into domain aggregation features,
relation ontology semantics, and range constraint features; and (3) Neural-Symbolic
Operators guided by domain constraints, which narrow the reasoning space through a
constraint-aware attention mechanism. Experiments on FB15k, FB15k-237, and NELL-995
demonstrate that MRCLQR achieves average MRR scores of 35.8%, 16.2%, and 19.6%, respectively
improving over the strongest baselines by 0.5%, 0.2%, and 0.2% --- and exhibits an
8.0% average gain on complex queries involving negation. Ablation studies validate
the effectiveness of multi-source collaboration and the curriculum learning strategy.
This work offers a novel paradigm for heterogeneous knowledge fusion and logical query
reasoning.
Revisiting Long-Tailed Learning: Insights from an Architectural Perspective
- Yuhan Pan
- Yanan Sun
- Wei Gong
Long-Tailed (LT) recognition has been widely studied to tackle the challenge of imbalanced
data distributions in real-world applications. However, the design of neural architectures
for LT settings has received limited attention, despite evidence showing that architecture
choices can substantially affect performance. This paper aims to bridge the gap between
LT challenges and neural network design by providing an in-depth analysis of how various
architectures influence LT performance. Specifically, we systematically examine the
effects of key network components on LT handling, such as topology, convolutions,
and activation functions. Based on these observations, we propose two convolutional
operations optimized for improved performance. Recognizing that operation interactions
are also crucial to network effectiveness, we apply Neural Architecture Search (NAS)
to facilitate efficient exploration. We propose LT-DARTS, a NAS method with a novel
search space and search strategy specifically designed for LT data. Experimental results
demonstrate that our approach consistently outperforms existing architectures across
multiple LT datasets, achieving parameter-efficient, state-of-the-art results when
integrated with current LT methods.
Temporal Distance-aware Subgoal Generation for Offline Hierarchical Reinforcement
Learning
- Taegeon Park
- Seungho Baek
- Jongchan Park
- Seungjun Oh
- Yusung Kim
Efficient subgoal generation is essential in offline Hierarchical Re- inforcement
Learning (HRL) for tackling long-horizon and sparse- reward tasks. Existing approaches
often struggle with redundant and inefficient subgoal candidates and fail to maintain
meaningful temporal relationships due to fixed-step subgoal sampling. To ad- dress
these issues, we propose Temporal Distance-Aware Subgoal Generation (TDSG), a novel
framework leveraging pre-trained Tem- poral Distance (TD) representations. TDSG identifies
a compact set of anchor states in the TD representation space. These states, evenly
spaced at consistent temporal distance intervals and collectively covering all states
in the dataset while comprising less than 1% of the entire dataset, serve as the training
targets for subgoal gener- ation. This ensures efficient and temporally consistent
high-level policy learning. Furthermore, the low-level policy leverages intrin- sic
rewards derived from the alignment between current states and subgoals in the TD representation
space, enabling effective learning even under sparse-reward conditions. Experimental
results demon- strate that TDSG achieves consistent performance improvement over prior
offline HRL methods across numeric and visual environ- ments. Our code is available
at https://github.com/Ptaegeon/TDSG.git
How Fair is FAIR? Understanding LOD Cloud FAIRness Through Correlation Patterns
- Maria Angela Pellegrino
- Gabriele Tuozzo
While the FAIR principles (Findability, Accessibility, Interoperability, and Reusability)
and data quality dimensions are widely used to evaluate Linked Data, their interdependencies
remain largely unexplored. This paper is grounded on a systematic integration of these
two frameworks by mapping data quality dimensions to FAIR sub-principles, revealing
how individual features---such as endpoint availability, metadata richness, or use
of standard vocabularies---can simultaneously contribute to multiple FAIR goals. Building
on this mapping, this paper reports a large-scale, data-driven, longitudinal study
of 1,445 datasets from the LOD Cloud extending KGHeartBeat, an open-source quality
assessment framework. This paper quantifies FAIRness at the sub-principle level and
computes correlation patterns across five temporal snapshots and nine topical domains.
The reported findings reveal that most correlations are positive and statistically
significant but vary across time and domain, with only a few stable or persistent
relationships. Strong inter-principle correlations-such as those linking metadata
standards and security transparency-emerge over time, while intra-principle coherence
is often weak. These insights offer concrete guidance for improving FAIR compliance,
highlight the importance of domain-aware evaluation, and support the development of
more holistic and reproducible FAIR assessment strategies for Linked Data ecosystems.
Dialogues Aspect-based Sentiment Quadruple Extraction via Structural Entropy Minimization
Partitioning
- Kun Peng
- Cong Cao
- Hao Peng
- Zhifeng Hao
- Lei Jiang
- Kongjing Gu
- Yanbing Liu
- Philip S. Yu
Dialogues Aspect-based Sentiment Quadruple Extraction (DiaASQ) aims to extract all
target-aspect-opinion-sentiment quadruples from a given multi-round, multi-participant
dialogue. Existing methods typically learn word relations across entire dialogues,
assuming a uniform distribution of sentiment elements. However, we find that dialogues
often contain multiple semantically independent sub-dialogues without clear dependencies
between them. Therefore, learning word relationships across the entire dialogue inevitably
introduces additional noise into the extraction process. To address this, our method
focuses on partitioning dialogues into semantically independent sub-dialogues. Achieving
completeness while minimizing these sub-dialogues presents a significant challenge.
Simply partitioning based on reply relationships is ineffective. Instead, we propose
utilizing a structural entropy minimization algorithm to partition the dialogues.
This approach aims to preserve relevant utterances while distinguishing irrelevant
ones as much as possible. Furthermore, we introduce a two-step framework for quadruple
extraction: first extracting individual sentiment elements at the utterance level,
then matching quadruples at the sub-dialogue level. Extensive experiments demonstrate
that our approach achieves state-of-the-art performance in DiaASQ with much lower
computational costs.
Data-centric Prompt Tuning for Dynamic Graphs
- Yufei Peng
- Cheng Yang
- Zhengjie Fan
- Chuan Shi
Dynamic graphs have attracted increasing attention due to their ability to model complex
and evolving relationships in real-world scenarios. Traditional approaches typically
pre-train models using dynamic link prediction and directly apply the resulting node
temporal embeddings to specific downstream tasks. However, the significant differences
among downstream tasks often lead to performance degradation, especially under few-shot
settings. Prompt tuning has emerged as an effective solution to this problem. Existing
prompting methods are often strongly coupled with specific model architectures or
pretraining tasks, which makes it difficult to adapt to recent or future model designs.
Moreover, their exclusive focus on modifying node or temporal features while neglecting
spatial structural information leads to limited expressiveness and degraded performance.
To address these limitations, we propose DDGPrompt, a data-centric prompting framework
designed to effectively refine pre-trained node embeddings at the input data level,
enabling better adaptability to diverse downstream tasks. We first define a unified
node expression feature matrix that aggregates all relevant temporal and structural
information of each node, ensuring compatibility with a wide range of dynamic graph
models. Then, we introduce three prompt matrices (temporal bias, edge weight, and
feature mask) to adjust the feature matrix completely, achieving task-specific adaptation
of node embeddings. We evaluate DDGPrompt under a strict few-shot setting on four
public dynamic graph datasets. Experimental results demonstrate that our method significantly
outperforms traditional methods and prompting approaches in scenarios with limited
labels and cold-start conditions.
DistillCaps: Enhancing Audio-Language Alignment in Captioning via Retrieval-Augmented
Knowledge Distillation
- Thinh Pham
- Nghiem Diep
- Lizi Liao
- Binh T. Nguyen
Automated audio captioning (AAC) benefits from incorporating external context to interpret
complex sounds, but doing so with retrieval-augmented generation (RAG) at inference
is sometimes infeasible due to data availability or incurs significant latency and
complexity. We propose DistillCaps, a novel training-time framework that leverages RAG to guide knowledge distillation
for improved audio-language alignment, while lessening the reliance on retrieval during
inference. In our framework, a RAG-equipped teacher model retrieves relevant textual
information (e.g., similar captions) for each audio clip and uses it for training to generate context-enriched
captions. Simultaneously, a student model is trained to imitate this teacher, learning
to produce high-quality captions from audio alone. We further introduce a Fast Fourier
Transform (FFT) adapter in the audio encoder to inject frequency-domain features,
enhancing the quality of audio representations before feeding them into the language
model. The result is an efficient captioning model that retains RAG's contextual benefits
without its deployment overhead. On standard AAC benchmarks (AudioCaps and Clotho),
DistillCaps achieves performance competitive with or exceeding prior RAG-based systems
despite using no retrieval at test time. Notably, our distilled model matches state-of-the-art
captioning results under real-time settings, and when optionally allowing retrieval,
it even outperforms previous models by up to 4% on the Clotho benchmark on the in-distribution
setting, demonstrating the effectiveness of RAG-guided distillation for audio-language
alignment. Code and dataset are available here. https://github.com/pgthinh/DistillCaps.
Oblivious Johnson--Lindenstrauss embeddings for compressed Tucker decompositions
- Matthew Pietrosanu
- Bei Jiang
- Linglong Kong
Emphasis in the tensor literature on random embeddings (tools for low-distortion dimension
reduction) for the canonical polyadic (CP) tensor decomposition has left analogous
results for the more expressive Tucker decomposition comparatively lacking. This work
establishes general Johnson--Lindenstrauss (JL) guarantees for the estimation of Tucker
decompositions when an oblivious random embedding is applied along each mode. When
these embeddings are drawn from a JL-optimal family, the decomposition can be estimated
within ε relative error under restrictions on the embedding dimension that are in
line with recent CP results. We implement a higher-order orthogonal iteration (HOOI)
decomposition algorithm with structured random embeddings to demonstrate the practical
benefits of this approach and its potential to improve the accessibility of otherwise
prohibitive tensor analyses. On moderately large face image and fMRI neuroimaging
datasets, empirical results show that substantial dimension reduction is possible
with minimal increase in reconstruction error relative to traditional HOOI (łeq15%
larger error, 50% lower computation time for large models with 50% dimension reduction
along each mode). Especially for large tensors, our method outperforms traditional
higher-order singular value decomposition (HOSVD) and recently proposed TensorSketch
methods.
Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data
- Bhawna Piryani
- Jamshid Mozafari
- Abdelrahman Abdallah
- Antoine Doucet
- Adam Jatowt
Optical Character Recognition (OCR) plays a crucial role in digitizing historical
and multilingual documents, yet OCR errors - imperfect extraction of text, including
character insertion, deletion, and substitution can significantly impact downstream
tasks like question-answering (QA). In this work, we conduct a comprehensive analysis
of how OCR-induced noise affects the performance of Multilingual QA Systems. To support
this analysis, we introduce a multilingual QA dataset MultiOCR-QA, comprising 50K
question-answer pairs across three languages, English, French, and German. The dataset
is curated from OCR-ed historical documents, which include different levels and types
of OCR noise. We then evaluate how different state-of-the-art Large Language Models
(LLMs) perform under different error conditions, focusing on three major OCR error
types. Our findings show that QA systems are highly prone to OCR-induced errors and
perform poorly on noisy OCR text. By comparing model performance on clean versus noisy
texts, we provide insights into the limitations of current approaches and emphasize
the need for more noise-resilient QA systems in historical digitization contexts.
Do Recommender Systems Really Leverage Multimodal Content? A Comprehensive Analysis
on Multimodal Representations for Recommendation
- Claudio Pomo
- Matteo Attimonelli
- Danilo Danese
- Fedelucio Narducci
- Tommaso Di Noia
Multimodal Recommender Systems aim to improve recommendation accuracy by integrating
heterogeneous content, such as images and textual metadata. While effective, it remains
unclear whether their gains stem from true multimodal understanding or increased model
complexity. This work investigates the role of multimodal item embeddings, emphasizing
the semantic informativeness of the representations. Initial experiments reveal that
embeddings from standard extractors (e.g., ResNet50, Sentence-Bert) enhance performance, but rely on modality-specific encoders and ad hoc fusion strategies
that lack control over cross-modal alignment. To overcome these limitations, we leverage
Large Vision-Language Models (LVLMs) to generate multimodal-by-design embeddings via structured prompts. This approach yields semantically aligned representations
without requiring any fusion. Experiments across multiple settings show notable performance
improvements. Furthermore, LVLMs embeddings offer a distinctive advantage: they can
be decoded into structured textual descriptions, enabling direct assessment of their
multimodal comprehension. When such descriptions are incorporated as side content
into recommender systems, they improve recommendation performance, empirically validating
the semantic alignment encoded in LVLMs outputs. Our study highlights the importance
of semantically rich representations and positions LVLMs as a compelling foundation
to build robust and meaningful multimodal representations in recommendation tasks.
Constraint Back-translation Improves Complex Instruction Following of Large Language
Models
- Yunjia Qi
- Hao Peng
- Xiaozhi Wang
- Bin Xu
- Lei Hou
- Juanzi Li
Large language models (LLMs) struggle to follow instructions with complex constraints
in format, length, etc. Following the conventional instruction-tuning practice, previous
works conduct post-training on complex instruction-response pairs generated by feeding
complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex
instructions well, thus limiting the quality of generated data. In this work, we find
that existing datasets inherently contain implicit complex constraints and propose
a novel data generation technique, constraint back-translation. Specifically, we take
the high-quality instruction-response pairs in existing datasets and only adopt advanced
LLMs to add complex constraints already met by the responses to the instructions,
which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct
to back-translate constraints and create a high-quality complex instruction-response
dataset, named Crab. We present that post-training on Crab improves multiple backbone
LLMs' complex instruction-following ability, evaluated on extensive instruction-following
benchmarks. We further find that constraint back-translation also serves as a useful
auxiliary training objective in post-training. Our code, data, and models are released
to facilitate future research.
Personalized Federated Recommendation with Multi-Faceted User Representation and Global
Consistent Prototype
- Jiaming Qian
- Xinting Liao
- Xiangmou Qu
- Zhihui Fu
- Xingyu Lou
- Changwang Zhang
- Pengyang Zhou
- Zijun Zhou
- Jun Wang
- Chaochao Chen
Personalized recommender systems are critical for enhancing user engagement across
a range of digital platforms. However, conventional approaches rely heavily on centralized
data collection, raising significant privacy concerns. Federated recommender systems
(PFRS) address these concerns by decentralizing model training, ensuring user data
privacy. Despite the progress, existing methods still struggle with capturing the
multi-faceted nature of user and transferring global knowledge effectively. In this
work, we propose FedMUR, a novel federated recommendation framework that models user
representation as a Gaussian mixture distribution, capturing users' multi-faceted
characteristics. Each Gaussian component corresponds to a distinct interest facet,
with adaptive mixture weights representing the user's preference intensity toward
each facet. To facilitate knowledge transfer, FedMUR constructs global consistent
prototypes that encode shared behavioral trends across users via popularity-weighted
optimal transport. These prototypes enhance local models by injecting global shared
patterns into personalized representation learning. Extensive experiments across several
real-world datasets demonstrate that FedMUR significantly outperforms existing state-of-the-art
federated recommendation systems.
TCFMamba: Trajectory Collaborative Filtering Mamba for Debiased Point-of-Interest
Recommendation
- Jin Qian
- Shiyu Song
- Xin Zhang
- Dongjing Wang
- He Weng
- Haiping Zhang
- Dongjin Yu
Next Point-of-Interest (POI) recommendation, which predicts users' future destinations
based on their potential interests, has emerged as a critical task in location-based
social networks (LBSNs). However, this task remains challenged by issues such as popularity bias, exposure bias, and limited representational capacity, all of which impede the accurate modeling of users and POIs, thereby restricting
balanced and effective recommendations. Therefore, we propose Trajectory Collaborative
Filtering Mamba (TCFMamba), which integrates two specially designed modules, i.e.,
Joint Learning of Static and Dynamic Representations (JLSDR) and Preference State
Mamba Network (PSMN), for debiased Point-of-Interest recommendation.
Invariant Treatment Effect Estimation via Consistent Constraints and Information Bottleneck
Considerable research has focused on the challenge of estimating individual treatment
effects (ITE) from observational data, primarily due to the presence of treatment
assignment bias. To address this, practitioners often adjust for relevant covariates
to correct potential biases. However, indiscriminately adjusting for all observed
covariates risks including 'bad controls'-variables that can introduce bias when conditioned
upon, thereby compromising ITE estimation accuracy. To tackle this issue, we propose
Invariant Treatment Effect Estimation via Consistent Constraints and Information Bottleneck
(CIBITE). This method mitigates the impact of bad controls by leveraging diverse environments
and adjusting for confounding factors in observational data, enabling robust ITE estimation.
We introduce a novel invariant causal prediction framework to eliminate bad controls
while retaining sufficient information for confounding adjustment. This is achieved
by imposing consistent constraints on both the representation and output layers of
the neural network. Additionally, the Information Bottleneck is employed to reduce
the influence of pseudo-invariant features. To further address confounding, we propose
a balanced representation learning framework using adversarial training. Extensive
experiments on synthetic, semi-simulated, and real-world datasets demonstrate the
effectiveness of our approach. The proposed method significantly outperforms state-of-the-art
ITE estimation techniques and existing Invariant Risk Minimization (IRM)-based methods.
OBDD-NET: End-to-End Learning of Ordered Binary Decision Diagrams
- Junming Qiu
- Rongzhen Ye
- Weilin Luo
- Kunxun Qi
- Hai Wan
- Yue Yu
Learning Ordered Binary Decision Diagrams (OBDDs) from large-scale datasets is an
important topic of explainable artificial intelligence. However, existing search-based
methods are still limited in scalability regarding dataset size, since they must explicitly
encode the satisfaction of all examples in a dataset. To tackle this challenge, we
introduce an OBDD encoding method to parameterize a neural network. This method frees
satisfaction encoding of all examples in a dataset while leveraging mini-batch training
techniques to enhance learning efficiency. Our main theoretical contribution is to
prove that our approach enables the simulation of OBDD inference within a continuous
space. Besides, we identify faithful OBDD encoding to fulfill the properties required
by OBDDs, allowing to interpret an OBDD directly from the learned parameter assignment.
With faithful OBDD encoding, we present an end-to-end neural model named ØBDDNet,
being capable of coping with large-scale datasets. Experimental results exhibit better
scalability and competitive prediction performance of ØBDDNet compared to state-of-the-art
OBDD learners. Valuable insights about faithful OBDD encoding are derived from the
ablation study. The implementation is available at: https://github.com/jmq-design/OBDD-NET.
UniROM: Unifying Online Advertising Ranking as One Model
- Junyan Qiu
- Ze Wang
- Fan Zhang
- Zuowu Zheng
- Jile Zhu
- Jiangke Fan
- Teng Zhang
- Haitao Wang
- Xingxing Wang
The Multi-stage Cascading Architecture (MCA), widely adopted in industrial advertising
systems to balance efficiency and effectiveness, suffers from critical limitations:
1) ranking inconsistency caused by conflicting modeling objectives and capacity gaps
across stages, and 2) the inability to model externalities-mutual influences among
candidate ads in ranking stages. These issues degrade system performance and lead
to suboptimal platform revenue. In this paper, we present UniROM, an end-to-end generative architecture that Unifies online advertising Ranking as One Model. UniROM replaces cascaded stages with a single model to directly generate optimal
ad sequences from the full candidate ad corpus in location-based services (LBS). The
primary challenges associated with this approach stem from high costs of feature processing
and computational bottlenecks in modeling externalities of large-scale candidate pools.
To address these challenges, UniROM introduces an algorithm and engine co-designed
hybrid feature service to decouple user and ad feature processing, reducing latency
while preserving expressiveness. To efficiently extract intra- and cross-sequence
mutual information, we propose RecFormer with an innovative cluster-attention mechanism
as its core architectural component. Furthermore, we propose a bi-stage training strategy
that integrates pre-training with reinforcement learning-based post-training to meet
sophisticated platform and advertising objectives. Extensive offline evaluations on
public benchmarks and large-scale online A/B testing on industrial advertising platform
have demonstrated the superior performance of UniROM over state-of-the-art MCAs.
GRLND: A Graph Reinforcement Learning Framework for Network Dismantling
- Hongbo Qu
- Xu Wang
- Yu-Rong Song
- Wei Ni
- Guo-Ping Jiang
- Quan Z. Sheng
Network Dismantling (ND) seeks to identify the smallest subset of nodes whose removal
fragments a network into disconnected components. Traditional methods rely on fixed
centrality heuristics or supervised models trained on synthetic data, often failing
to generalize across diverse topologies. We introduce GRLND, a Graph Reinforcement
Learning framework that enables fully unsupervised, structure-aware dismantling through
end-to-end optimization. GRLND formulates ND as a single-step Markov Decision Process
(MDP), where the action is a binary mask indicating the nodes to be removed-allowing
the agent to generate a complete dismantling strategy in a single forward pass while
accounting for the joint effect of multiple node removals. The framework combines
a Graph Convolutional Network (GCN) for topological encoding with a stochastic policy
trained via the REINFORCE algorithm. Additionally, we design a task-specific reward
that balances connectivity disruption and removal sparsity, guiding the policy toward
compact yet high-impact dismantling solutions. Experiments on both synthetic and real-world
networks show that GRLND consistently outperforms classical heuristics and recent
learning-based methods, achieving strong generalization without requiring labels or
pretraining.
Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts
- Yunke Qu
- Liang Qu
- Tong Chen
- Quoc Viet Hung Nguyen
- Hongzhi Yin
Streaming recommender systems (SRSs) are widely deployed in real-world applications,
where user interests shift and new items arrive over time. As a result, effectively
capturing users' latest preferences is challenging, as interactions reflecting recent
interests are limited and new items often lack sufficient feedback. A common solution
is to enrich item representations using multimodal encoders (e.g., BERT or ViT) to
extract visual and textual features. However, these encoders are pretrained on general-purpose
tasks: they are not tailored to user preference modeling, and they overlook the fact
that user tastes toward modality-specific features such as visual styles and textual
tones can also drift over time. This presents two key challenges in streaming scenarios:
the high cost of fine-tuning large multimodal encoders, and the risk of forgetting
long-term user preferences due to continuous model updates. To tackle these challenges,
we propose Expandable Side Mixture-of-Experts (XSMoE), a memory-efficient framework
for multimodal streaming recommendation. XSMoE attaches lightweight side-tuning modules
consisting of expandable expert networks to frozen pretrained encoders and incrementally
expands them in response to evolving user feedback. A gating router dynamically combines
expert and backbone outputs, while a utilization-based pruning strategy maintains
model compactness. By learning new patterns through expandable experts without overwriting
previously acquired knowledge, XSMoE effectively captures both cold start and shifting
preferences in multimodal features. Experiments on three real-world datasets demonstrate
that XSMoE outperforms state-of-the-art baselines in both recommendation quality and
computational efficiency.
Causality-aware Graph Aggregation Weight Estimator for Popularity Debiasing in Top-K
Recommendation
- Yue Que
- Yingyi Zhang
- Xiangyu Zhao
- Chen Ma
Graph-based recommender systems leverage neighborhood aggregation to generate node
representations, which is highly sensitive to popularity bias, resulting in an echo
effect during information propagation. Existing graph-based debiasing solutions refine
the aggregation process with attempts such as edge reconstruction or weight adjustment.
However, these methods remain inadequate in fully alleviating popularity bias. Specifically,
this is because 1) they provide no insights into graph aggregation rationality, thus
lacking an optimality guarantee; 2) they fail to well balance the training and debiasing
process, which undermines the effectiveness.
In this paper, we propose a novel approach to mitigate popularity bias through rational
modeling of the graph aggregation process. We reveal that graph aggregation is a special
form of backdoor adjustment in causal inference, where the aggregation weight corresponds
to the historical interaction likelihood distribution. Based on this insight, we devise
an encoder-decoder architecture, namely Causality-aware Graph Aggregation Weight Estimator
for Debiasing (CAGED), to approximate the unbiased aggregation weight by optimizing
the evidence lower bound of the interaction likelihood. In order to enhance the debiasing
effectiveness during early training stages, we further design a momentum update strategy
that incrementally refines the aggregation weight matrix. Extensive experiments on
three datasets demonstrate that CAGED outperforms existing graph-based debiasing methods.
Our implementation is available at https://github.com/QueYork/CAGED.
ITL-LIME: Instance-Based Transfer Learning for Enhancing Local Explanations in Low-Resource
Data Settings
- Rehan Raza
- Guanjin Wang
- Kok Wai Wong
- Hamid Laga
- Marco Fisichella
Explainable Artificial Intelligence (XAI) methods, such as Local Interpretable Model-Agnostic
Explanations (LIME), have advanced the interpretability of black-box machine learning
models by approximating their behavior locally using interpretable surrogate models.
However, LIME's inherent randomness in perturbation and sampling can lead to locality
and instability issues, especially in scenarios with limited training data. In such
cases, data scarcity can result in the generation of unrealistic variations and samples
that deviate from the true data manifold. Consequently, the surrogate model may fail
to accurately approximate the complex decision boundary of the original model. To
address these challenges, we propose a novel Instance-based Transfer Learning LIME
framework (ITL-LIME) that enhances explanation fidelity and stability in data-constrained
environments. ITL-LIME introduces instance transfer learning into the LIME framework
by leveraging relevant real instances from a related source domain to aid the explanation
process in the target domain. Specifically, we employ clustering to partition the
source domain into clusters with representative prototypes. Instead of generating
random perturbations, our method retrieves pertinent real source instances from the
source cluster whose prototype is most similar to the target instance. These are then
combined with the target instance's neighboring real instances. To define a compact
locality, we further construct a contrastive learning-based encoder as a weighting
mechanism to assign weights to the instances from the combined set based on their
proximity to the target instance. Finally, these weighted source and target instances
are used to train the surrogate model for explanation purposes. Experimental evaluation
with real-world datasets demonstrates that ITL-LIME greatly improves the stability
and fidelity of LIME explanations in scenarios with limited data. Our code is available
at https://github.com/rehanrazaa/ITL-LIME.
Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation
- Kelin Ren
- Chan-Yang Ju
- Dong-Ho Lee
Multimodal recommendation systems are increasingly becoming foundational technologies
for e-commerce and content platforms, enabling personalized services by jointly modeling
users' historical behaviors and the multimodal features of items (e.g., visual and
textual). However, most existing methods rely on either static fusion strategies or
graph-based local interaction modeling, facing two critical limitations: (1) insufficient
ability to model fine-grained cross-modal associations, leading to suboptimal fusion
quality; and (2) a lack of global distribution-level consistency, causing representational
bias. To address these, we propose MambaRec, a novel framework that integrates local
feature alignment and global distribution regularization via attention-guided learning.
At its core, we introduce the Dilated Refinement Attention Module (DREAM), which uses
multi-scale dilated convolutions with channel-wise and spatial attention to align
fine-grained semantic patterns between visual and textual modalities. This module
captures hierarchical relationships and context-aware associations, improving cross-modal
semantic modeling. Additionally, we apply Maximum Mean Discrepancy (MMD) and contrastive
loss functions to constrain global modality alignment, enhancing semantic consistency.
This dual regularization reduces mode-specific deviations and boosts robustness. To
improve scalability, MambaRec employs a dimensionality reduction strategy to lower
the computational cost of high-dimensional multimodal features. Extensive experiments
on real-world e-commerce datasets show that MambaRec outperforms existing methods
in fusion quality, generalization, and efficiency. Our code has been made publicly
available at https://github.com/rkl71/MambaRec.
Fine-Grained Emotion Recognition via In-Context Learning
- Zhaochun Ren
- Zhou Yang
- Chenglong Ye
- Haizhou Sun
- Chao Chen
- Xiaofei Zhu
- Xiangwen Liao
Fine-grained emotion recognition aims to identify the emotional type in queries through
reasoning and decision-making processes, playing a crucial role in various systems.
Recent methods use In-Context Learning (ICL), enhancing the representation of queries
in the reasoning process through semantically similar examples, while further improving
emotion recognition by explaining the reasoning mechanisms. However, these methods
enhance the reasoning process but overlook the decision-making process. This paper
investigates decision-making in fine-grained emotion recognition through prototype
theory. We show that ICL relies on similarity matching between query representations
and emotional prototypes within the model, where emotion-accurate representations
are critical. However, semantically similar examples often introduce emotional discrepancies,
hindering accurate representations and causing errors. To address this, we propose
Emotion In-Context Learning (EICL), which introduces emotionally similar examples
and uses a dynamic soft-label strategy to improve query representations in the emotion
reasoning process. A two-stage exclusion strategy is then employed to assess similarity
from multiple angles, further optimizing the decision-making process. Extensive experiments
show that EICL significantly outperforms ICL on multiple datasets.
Is This News Still Interesting to You?: Lifetime-aware Interest Matching for News
Recommendation
- Seongeun Ryu
- Yunyong Ko
- Sang-Wook Kim
Personalized news recommendation aims to deliver news articles aligned with users'
interests, serving as a key solution to alleviate the problem of information overload
on online news platforms. While prior work has improved interest matching through
refined representations of news and users, the following time-related challenges remain
underexplored: (C1) leveraging the age of clicked news to infer users' interest persistence, and (C2) modeling the varying lifetime of news across topics and users. To jointly address these challenges, we propose a novel Lifetime-aware Interest
Matching framework for nEws recommendation, named LIME, which incorporates three key strategies: (1) User-Topic lifetime-aware age representation
to capture the relative age of news with respect to a user-topic pair, (2) Candidate-aware
lifetime attention for generating temporally aligned user representation, and (3)
Freshness-guided interest refinement for prioritizing valid candidate news at prediction
time. Extensive experiments on two real-world datasets demonstrate that LIME consistently
outperforms a wide range of state-of-the-art news recommendation methods, and its
model-agnostic strategies significantly improve recommendation accuracy.
Empirical Study of Over-Squashing in GNNs and Causal Estimation of Rewiring Strategies
- Danial Saber
- Amirali Salehi-Abari
Graph neural networks (GNNs) have exhibited state-of-the-art performance across a
wide range of domains. Yet message-passing GNNs suffer from over-squashing---exponential
compression of long-range information from distant nodes---which limits expressivity.
Rewiring techniques can ease this bottleneck, but their practical impacts are unclear
due to the lack of a direct empirical over-squashing metric. We propose a topology-focused
method for assessing over-squashing between node pairs using the decay rate of their
mutual sensitivity. We then extend these pairwise assessments to graph-level statistics.
Coupling these metrics with a within-graph causal design, we quantify how rewiring
strategies affect over-squashing on diverse graph- and node-classification benchmarks.
Our extensive empirical analyses show that most graph classification datasets suffer
from over-squashing (but to various extents), and rewiring effectively mitigates it---though
the degree of mitigation, and its translation into performance gains, varies by dataset
and method. We also found that over-squashing is less notable in node classification
datasets, where rewiring often increases over-squashing, and performance variations
are uncorrelated with over-squashing changes. These findings suggest that rewiring
is most beneficial when over-squashing is both substantial and corrected with restraint---while
overly aggressive rewiring, or rewiring applied to minimally over-squashed graphs,
is unlikely to help and may even harm performance. Our plug-and-play diagnostic tool
lets practitioners decide whether rewiring is likely to pay off.
On Verifiable Legal Reasoning: A Multi-Agent Framework with Formalized Knowledge Representations
- Albert Sadowski
- Jaroslaw A. Chudziak
Legal reasoning requires both precise interpretation of statutory language and consistent
application of complex rules, presenting significant challenges for AI systems. This
paper introduces a modular multi-agent framework that decomposes legal reasoning into
distinct knowledge acquisition and application stages. In the first stage, specialized
agents extract legal concepts and formalize rules to create verifiable intermediate
representations of statutes. The second stage applies this knowledge to specific cases
through three steps: analyzing queries to map case facts onto the ontology schema,
performing symbolic inference to derive logically entailed conclusions, and generating
final answers using a programmatic implementation that operationalizes the ontological
knowledge. This bridging of natural language understanding with symbolic reasoning
provides explicit and verifiable inspection points, significantly enhancing transparency
compared to end-to-end approaches. Evaluation on statutory tax calculation tasks demonstrates
substantial improvements, with foundational models achieving 76.4% accuracy compared
to 18.8% baseline performance, effectively narrowing the performance gap between reasoning
and foundational models. These findings suggest that modular architectures with formalized
knowledge representations can make sophisticated legal reasoning more accessible through
computationally efficient models while enhancing consistency and explainability in
AI legal reasoning, establishing a foundation for future research into more transparent,
trustworthy, and effective AI systems for legal domain.
STGS: Spatio-temporal Graph Sparsification Using Reinforcement Learning
- Nasrin Shabani
- Amin Beheshti
- Yuankai Qi
- Venus Haghighi
- Jin Foo
- Jia Wu
Spatio-temporal graphs encode dynamic interactions across space and time, but their
size and complexity pose challenges for analysis and computation. Graph sparsification
provides an effective solution to these issues by reducing the number of edges while
preserving the essential structural and dynamic properties of the network. This reduction
is crucial for enhancing the interpretability of complex graphs, revealing hidden
patterns, and enabling more efficient computational analysis. However, real-world
graphs often exhibit continuous spatial and temporal evolution, which most existing
sparsification algorithms, primarily designed for static graphs, fail to address.
We introduce STGS (Spatio-Temporal Graph Sparsification), a reinforcement learning-based
framework for sparsifying spatio-temporal graphs. By learning to prune edges while
preserving key spatio-temporal patterns, STGS enables efficient analysis of evolving
systems. Experiments on real-world datasets demonstrate that STGS outperforms existing
methods in both structural preservation and downstream forecasting tasks.
General Adaptive Memory Allocation for Learned Bloom Filters
- You Shang
- Xiang He
- Ruiyuan Li
- Yingying Sun
- Guanyao Li
- Guangchao Yang
- Junbo Zhang
- Yu Zheng
Membership testing, which determines whether an element belongs to a set, is widely
used in fields like database systems and network applications. Bloom Filters (BFs)
can solve this problem efficiently but suffer from high False Positive Rates (FPRs)
and large memory requirements for massive datasets. Learned Bloom Filters (LBFs),
combining a learning model with a backup Bloom Filter, mitigate these issues by capturing
data distributions. However, the critical problem of memory allocation between the
learning model and the backup filter has usually been overlooked, despite its significant
impact on LBF performance under constrained budgets.
To this end, we propose Gama, the first General Adaptive Memory Allocation framework
for LBFs as far as we know. Gama introduces two memory allocation strategies: Loop-Based
method and Bayesian-Based method. Loop-Based method evaluates all configurations at
each training epoch, making it well-suited for scenarios with tight memory constraints.
However, it faces efficiency challenges under large memory budgets due to the requirement
for exhaustive evaluations. In contrast, Bayesian-Based method efficiently navigates
the search space through probabilistic exploration, which reduces the number of configurations
evaluated and significantly improves the efficiency while maintaining FPRs. Furthermore,
we propose a hybrid approach that combines their strengths to dynamically adapt to
different constraints. Experiments on three real-world datasets show that Gama can
achieve a relative performance improvement of 69% in terms of FPR in the best case.
Retrieval-Augmented Image Captioning via Synthesized Entity-Aware Knowledge Representations
- Lin Shen
- Chenxu Cui
- Jinchao Zhang
- Haihui Fan
- Haotian Jin
- Bo Li
Retrieval-Augmented Image Captioning enhances the model's understanding of real-world
images by retrieving external knowledge. Existing methods mainly use original captions
or isolated entities related to the query image to help generate captions. However,
these methods make the model either imitate the caption style or fail to capture the
relationship between entities, resulting in a lack of diversity or inaccuracy in the
generated captions. To address these issues, we propose SEAR, a novel framework that
utilizes external Synthesized Entity-Aware knowledge Representations to improve captioning
performance. Specifically, SEAR clusters images based on scene-level and entity-level
features, and synthesizes each clustered images into representative images as retrieval
indexes, and simultaneously utilizes a large model to extract and supplement structured
knowledge graphs from the corresponding cluster captions. Furthermore, we design a
knowledge-graph pruner to prune the knowledge graph by retaining the most relevant
subgraphs to the query image. By undertaking these steps in an integrated manner,
SEAR enables the model to acquire non-redundant and structured information for generating
captions and avoid data-related privacy issues. Extensive experiments on MSCOCO, Flickr30k,
and NoCaps demonstrate the effectiveness of our method both in-domain and out-of-domain,
outperforming existing lightweight RAIC methods and remaining competitive with heavyweight
models.
EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling
- Lingzhi Shen
- Xiaohao Cai
- Yunfei Long
- Imran Razzak
- Guanming Chen
- Shoaib Jameel
Personality detection from text is commonly performed by analysing users' social media
posts. However, existing methods heavily rely on large-scale annotated datasets, making
it challenging to obtain high-quality personality labels. Moreover, most studies treat
emotion and personality as independent variables, overlooking their interactions.
In this paper, we propose a novel self-supervised framework, EmoPerso, which improves
personality detection through emotion-aware modelling. EmoPerso first leverages generative
mechanisms for synthetic data augmentation and rich representation learning. It then
extracts pseudo-labeled emotion features and jointly optimizes them with personality
prediction via multi-task learning. A cross-attention module is employed to capture
fine-grained interactions between personality traits and the inferred emotional representations.
To further refine relational reasoning, EmoPerso adopts a self-taught strategy to
enhance the model's reasoning capabilities iteratively. Extensive experiments on two
benchmark datasets demonstrate that EmoPerso surpasses state-of-the-art models. The
source code is available at https://github.com/slz0925/EmoPerso.
ROKAN: Toward Interpretable and Domain-Robust Memory Behavior Modeling
- Xiaoxuan Shen
- Zhihai Hu
- Di Chen
- Jianwen Sun
- Shengyingjie Liu
Memory behavior modeling aims to predict individual performance over time and uncover
underlying cognitive mechanisms. However, existing approaches often struggle to balance
predictive accuracy, domain generalization, and model interpretability. To address
this, we propose ROKAN, a cognitively inspired and symbolically interpretable memory
modeling framework. Based on the Multiscale Context Model, ROKAN formalizes the evolution
of memory traces as a differentiable Ordinary Differential Equation system, implemented
via Kolmogorov-Arnold Networks to derive human-readable symbolic expressions. To enhance
generalization across heterogeneous learning domains, we design an Adaptive Domain-Aware
loss function, which integrates Empirical Risk Minimization with Distributionally
Robust Optimization through dynamic domain-aware weighting. Our experiments demonstrate
that ROKAN significantly outperforms existing mainstream methods in both predictive
accuracy and domain generalization. The symbolic expressions were found to exhibit
formal consistency with classical memory theories, which lends support to the model's
theoretical assumptions and empirical performance, and provides a new pathway toward
theoretically grounded white-box memory modeling. Our code is available at https://github.com/hellowads/ROKAN.
Towards Few-shot Chemical Reaction Outcome Prediction
- Yili Shen
- Yijun Tian
- Cheng-Wei Ju
- Olaf Wiest
- Xiangliang Zhang
Accurate chemical reaction prediction is essential for drug discovery and synthetic
planning. However, this task becomes particularly challenging in low-data scenarios,
where novel reaction types lack sufficient training examples. To address this challenge,
we propose FewRxn, a novel model-agnostic few-shot reaction prediction framework that
enables rapid adaptation to unseen reaction types using only a few training samples.
FewRxn integrates several key innovations, including segmentation masks for enhanced
reactant representation, fingerprint embeddings for richer molecular context, and
task-aware meta-learning for effective knowledge transfer. Through extensive evaluations,
FewRxn achieves state-of-the-art accuracy in few-shot settings, significantly outperforming
traditional fine-tuning methods. Additionally, our work provides insights into the
impact of molecular representations on reaction knowledge transfer, demonstrating
that knowledge captured under molecular graph-based formulation consistently outperforms
those learned in forms of SMILES generation in few-shot learning.
Local Structure-Adaptive Graph Filtering for Collaborative Filtering
- Yijun Sheng
- Ximing Chen
- Yanyan Liu
- Pui Ieng Lei
- Zhiguo Gong
The structural heterogeneity of user-item interaction graphs poses a fundamental challenge
for graph-based recommender systems. While Graph Convolutional Networks (GCNs) have
achieved remarkable success in collaborative filtering, their uniform low-pass filtering
nature often fails to accommodate the varying spectral needs of nodes with different
local structures, resulting in suboptimal performance. To address this issue, we propose
a Structurally Sensitive Adaptive Graph Filter, dubbed as SSAGF, a novel framework that enables structure-aware filtering on user-item graphs. SSAGF
first clusters nodes based on local structural properties, then learns customized
filters per group by reinterpreting the convolutional depth in GCNs. This adaptive
mechanism ensures that nodes with distinct structural roles are treated appropriately,
enhancing both accuracy and fairness without significantly increasing model complexity.
To further improve scalability, SSAGF avoids costly eigenvalue decompositions by approximating
spectral filters through Maclaurin series expansion, transforming the convolution
into a pooling-like operation over standard GCN outputs. Extensive experiments on
four benchmark datasets demonstrate that SSAGF consistently outperforms competitive
baselines, especially in scenarios with high structural heterogeneity, offering a
principled and efficient solution for structure-aware recommendation.
Learning Invariant Reliability under Diverse Contexts for Robust Multimedia Recommendation
- Yijun Sheng
- Pui Ieng Lei
- Yanyan Liu
- Ximing Chen
- Zhiguo Gong
In graph-based multimedia recommendation, accurately modeling item-item semantic similarity
is crucial for constructing high-quality semantic structures. However, multimodal
content often exhibits semantic inconsistencies across modalities, resulting in noisy
or misleading similarity signals. We refer to this as modality mismatching, where unaligned representations, such as an image conveying semantics unrelated
to its accompanying text, undermine the reliability of feature-based similarity estimation.
Importantly, modality consistency is context sensitive, varying with the underlying
semantic environment in which modalities are interpreted. This highlights the necessity of jointly modeling
modality reliability and contextual semantics. To address this challenge, we propose
RGSLMRec, a robust graph structure learning framework that models and exploits the semantic
reliability of multimodal features under diverse contexts. At its core, RGSLMRec builds
on the invariant learning paradigm and introduces two key innovations: (i) it simulates multiple perturbed
semantic environments and employs environment-specific monotonic networks to estimate
reliability; and (ii) it adopts a risk-invariant objective based on Variance Risk Extrapolation to enforce the learning of invariant reliability across environments. On top of this,
(iii) RGSLMRec constructs reliability-guided item-item graphs and captures collaborative
and semantic signals via a hybrid early-late fusion strategy. Extensive experiments
on several real-world datasets and additional synthetically perturbed datasets demonstrate
that RGSLMRec not only outperforms strong baselines but also exhibits superior robustness
to modality mismatching.
Continuous Data Augmentation via Condition-Tokenized Diffusion Transformer for Sequential
Recommendation
- Chenglong Shi
- Haosen Wang
- Pan Tang
Data augmentation plays a crucial role in enhancing sequential recommendation (SR)
by providing richer training signals. Recently, diffusion models (DMs) have been introduced
into SR to generate realistic interaction sequences. However, existing DM-based methods
face three key limitations: (1) semantic deviation. The rounding procedure, which maps continuous embeddings to discrete item sequences,
may introduce semantic deviation; (2) preference misalignment. The explicit preference guidance is neglected during generation, resulting in synthetic
sequences that misalign with users' actual interests; (3) suboptimal training strategy. The two-stage methods, which train the SR model and the DM separately, overlook their
potential complementarity. To address these challenges, we propose Continuous Data Augmentation via Condition-Tokenized Diffusion Transformer for Sequential Recommendation (CATDiT). Specifically, CATDiT discards the rounding operation and leverages continuous embeddings
as augmented data to preserve semantic integrity. Then, we guide the generation process
with user intent via a condition-tokenized Diffusion Transformer, aligning synthetic
sequences with users' real preferences. Finally, we propose an alternating optimization
strategy to enable mutual learning between the SR model and the DM. Extensive experiments
on five real-world datasets demonstrate that CATDiT consistently outperforms state-of-the-art
baselines, validating its effectiveness in generating high-quality sequences and improving
SR performance.
Incremental Learning for LLM-based Tokenization and Recommendation
- Haihan Shi
- Xinyu Lin
- Wenjie Wang
- Wentao Shi
- Junwei Pan
- Jiang Jie
- Fuli Feng
Large Language Models for Recommendation (LLM4Rec) have shown great potential. Many
LLM4Rec approaches technically leverage a learnable tokenizer to assign item identifiers
and then enable a Recommender LLM (RecLLM) to process tokenized items and user interactions
for recommendation. However, a key challenge in their real-world deployment is the
need for continuous retraining over time to accommodate new items and evolving user
interests. While existing retraining methods can be applied to RecLLMs, learnable
tokenizers introduce additional retraining challenges. We conduct a comprehensive
investigation into the joint retraining of RecLLMs and learnable tokenizers, identifying
key issues such as identifier collision and identifier shifts across periods. To address
these, we propose Reformer, an incremental learning framework to fine-tune RecLLMs
and learnable tokenizers at each period. Reformer employs a dynamic codebook to mitigate
identifier collision by appending new codes and enforcing a diversity-oriented code
assignment constraint. Additionally, Reformer adopts an identifier freezing strategy
to ensure the invariance of previously assigned item identifiers across retraining
periods. We instantiate Reformer on two representative RecLLMs and conduct extensive
experiments on three real-world datasets. Substantial results demonstrate its superior
retraining performance, facilitating the real-world deployment of LLM4Rec.
SimFormer: Multilevel Transformer on Learnable Mesh Graphs for Engineering Simulation
- Jiasheng Shi
- Fu Lin
- Weixiong Rao
- Ze Gao
Numerical simulation is important in real-world engineering systems, such as solid
mechanics and aero-dynamics. Hierarchical GNNs can learn engineering simulation with
low simulation time and acceptable accuracy, but fail to represent complex interactions
in simulation systems. In this paper, we propose a novel multilevel Transformer on
learnable clusters, namely SimFormer. The key novelty of SimFormer is to interweave the learning of a learnable soft-cluster assignment algorithm and
the inter-cluster/cluster-to-node attention. In form of a closed-loop, SimFormer learns the soft cluster assignment possibility by the feedback signals provided by
the attention, and the attention can leverage the learnable clusters to better represent
long-range interactions. In this way, the learnable clusters can adaptively match
actual simulation results, and the multilevel attention modules can also effectively
represent node embeddings. Experiments on four datasets demonstrate the superiority
of SimFormer over seven baseline approaches. For example, on the real dataset, ours outperforms
the recent work Eagle by 17.36% lower RMSE and 27.03% smaller FLOPs. The code and
datasets are available at: https://github.com/pro-orp/SimFormer.
Community Partition-based Source Localization with Adaptive Observers Deployment
- Jinchen Shi
- Yang Fang
- Zhen Tan
- Xin Zhang
- Xiang Zhao
In the contemporary era, characterized by an accelerated development in the domain
of social networks, the phenomenon of fake news has attained unprecedented levels
of prevalence, exerting substantial detrimental influence on society. Identification
of the sources of such information in a timely manner is of paramount importance in
order to prevent further damage. Existing source localization methods can be categorized
into two distinct approaches: the first involves the deployment of observers followed
by localization, while the second employs traditional community partitioning for source
localization without considering community structure in observer deployment, resulting
in suboptimal information acquisition. To address this issue, we propose Community
Partition-Based Source Localization with Adaptive Observers Deployment (CSOL), which
consists of three stages: In the first stage, community partitioning is achieved using
contrastive learning with optimization and a feature extraction module that is highly
correlated with partition. In the second stage, we are the first work to adaptively
deploy observer based on community importance, integrating community partitioning
with observer placement. In the third stage, an early source estimation strategy is
employed to enhance efficiency and accuracy. Experimental results in real-world networks
demonstrate that CSOL outperforms other SOTA methods in both accuracy and efficiency.
MSOFormer: Multi-scale Transformer with Orthogonal Embedding and Frequency Modeling
for Multivariate Time Series Forecasting
- Qin Shi
- Chu Xu
- Zongtang Hu
- Dong Shen
- Dapeng Sun
- Lijun Quan
Multivariate Time Series Forecasting (MTSF) plays a critical role in diverse practical
applications. Although Transformer-based models have recently achieved impressive
results in this field, their performance is still hindered by three core challenges:
complex temporal dependencies, diverse inter-variable correlations, and patterns that
span multiple time scales. To address these issues, we propose MSOFormer-a Multi-scale
Transformer with Orthogonal Embedding and Frequency Modeling. Specifically, the Dynamic
Frequency Filter adaptively weights frequency components across variables based on
input characteristics, enabling full-spectrum modeling and precise extraction of key
frequency patterns. To improve inter-variable representation, we introduce Orthogonal
Embedding, a novel projection strategy for queries and keys that enhances feature
diversity in channel-wise self-attention. In addition, Multi-scale Patch Embedding
captures temporal features across different scales, providing a comprehensive time
series representation. To evaluate MTSF in cloud-native environments, we construct
the first three Cloud Kafka cluster datasets, specifically curated for elastic message
queue scaling scenarios. Extensive experiments across eleven real-world benchmark
datasets demonstrate that MSOFormer consistently outperforms existing state-of-the-art
methods, highlighting its effectiveness and broad applicability.
Benefit from Rich: Tackling Search Interaction Sparsity in Search Enhanced Recommendation
- Teng Shi
- Weijie Yu
- Xiao Zhang
- Ming He
- Jianping Fan
- Jun Xu
In modern online platforms, search and recommendation (S&R) often coexist, offering opportunities for performance improvement through search-enhanced
approaches. Existing studies show that incorporating search signals boosts recommendation
performance. However, the effectiveness of these methods relies heavily on rich search
interactions. They primarily benefit a small subset of users with abundant search
behavior, while offering limited improvements for the majority of users who exhibit
only sparse search activity. To address the problem of sparse search data in search-enhanced
recommendation, we face two key challenges : (1) how to learn useful search features
for users with sparse search interactions, and (2) how to design effective training
objectives under sparse conditions. Our idea is to leverage the features of users
with rich search interactions to enhance those of users with sparse search interactions.
Based on this idea, we propose GSERec, a method that utilizes message passing on the User-Code Graphs to alleviate data sparsity in Search-Enhanced Recommendation. Specifically, we utilize Large Language Models (LLMs) with vector quantization
to generate discrete codes, which connect similar users and thereby construct the
graph. Through message passing on this graph, embeddings of users with rich search
data are propagated to enhance the embeddings of users with sparse interactions. To
further ensure that the message passing captures meaningful information from truly
similar users, we introduce a contrastive loss to better model user similarities.
The enhanced user representations are then integrated into downstream search-enhanced
recommendation models. Experiments on three real-world datasets show that GSERec consistently
outperforms baselines, especially for users with sparse search behaviors.
Docking-Aware Attention: Dynamic Protein Representations through Molecular Context
Integration
- Amitay Sicherman
- Kira Radinsky
Computational prediction of enzymatic reactions represents a crucial challenge in
sustainable chemical synthesis across various scientific domains, ranging from drug
discovery to materials science and green chemistry. These syntheses rely on highly
adaptable protein catalysts that perform different molecular transformations depending
on their molecular partners. Current approaches to protein representation in reaction
prediction either ignore protein structure entirely or rely on static embeddings,
failing to capture how proteins dynamically adapt their behavior to different substrates.
We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic,
context-dependent protein representations by incorporating molecular docking information
into the attention mechanism. DAA combines physical interaction scores from docking
predictions with learned attention patterns to focus on protein regions most relevant
to specific molecular interactions. We evaluate our method on enzymatic reaction prediction,
where it outperforms previous state-of-the-art methods, demonstrating a 9.5% relative
improvement on complex molecules and a 12.3% relative improvement on innovative reactions.
Furthermore, we demonstrate the generalization capabilities of our learned representations
of a Drug-Target Interaction. We demonstrate how DAA generates interpretable attention
patterns that adapt to different molecular contexts through detailed ablation studies
and visualizations. Our approach represents a general framework for context-aware
protein representation, with potential applications across enzymatic synthesis planning
and other protein-molecule interaction tasks. We open-source our implementation and
pre-trained models to facilitate further research.
InstANNS: Scalable Approximate Nearest Neighbor Search via Cost-Efficient In-Storage
Processing
- Bonggeun Sim
- Yushin Kim
- Minseo Kim
- Yeonhong Park
- Jae W. Lee
Billion-scale approximate nearest neighbor search (ANNS) increasingly relies on disk-based
indexes due to the rapid growth of modern datasets. Existing disk-augmented indexing
systems, such as SPANN, often face performance bottlenecks due to limited host interface
bandwidth, typically constrained by PCIe. To address this bottleneck, we introduce
InstANNS, a storage-centric ANNS architecture that improves throughput and reduces
data transfer by performing query-aware PQ filtering inside SSDs, without relying
on GPUs. By offloading distance computations to the SSD controller and utilizing abundant
internal bandwidth, our design transfers only highly relevant candidates to the host,
significantly reducing PCIe traffic. To further optimize performance, we propose co-occurrence-aware PQ code placement, which co-locates frequently co-accessed candidates, and conditional PQ bypass, which reduces NAND reads by skipping low-utility filtering. We prototype InstANNS
by extending SPANN and evaluate it using a device-level SSD simulator, meticulously
calibrated against measurements from an actual SSD controller SoC (System-on-Chip)
to ensure accurate performance evaluation. Experimental results show that InstANNS
improves QPS by 2.15× over SPANN and QPS-per-dollar by 1.7× over FusionANNS at 90%
Recall@10, while maintaining accuracy.
Parse-LLM: A Prior-Free LLM Parser for Unknown System Logs
- Chengyu Song
- Lin Yang
- Jianming Zheng
- Jinzhi Liao
- Feng Yang
- Linru Ma
- Fei Cai
Log parsing extracts structured information from unstructured logs and serves as a
fundamental pre-processing step for various log-based analytics and monitoring tasks.
Recent advances have leveraged Large Language Models (LLMs) to handle log format complexities
and enhance parsing performance. However, these methods heavily rely on labeled data,
which is often scarce in rapidly evolving industrial systems, limiting their applicability
in real-world scenarios. Moreover, the sheer volume of logs results in slow parsing
and high computational costs, further hindering the deployment of LLM-based log parsing
systems. To address these issues, we propose Parse-LLM, an unsupervised end-to-end
log parsing framework based on LLMs Specifically, we first developed a Log Decomposer
Agent that leverages Chain-of-Thought (CoT) reasoning and callable tools, enabling
the LLM to autonomously separate log headers from content. Next, we introduce the
Hybrid Log Partition module, which segments logs by balancing commonalities and differences.
Finally, we developed a novel Variation-aware Log Parsing module that allows the LLM
to harness additional supervisory signals through comparative analysis of similar
logs. Comprehensive experiments conducted on large-scale public datasets show that
Parse-LLM outperforms state-of-the-art log parsers in an unsupervised setting, offering
an effective and scalable solution for the practical application of unsupervised log
parsing.
ProxySampler: Proxy Informativeness Estimation for Efficient Data Selection in Active
Learning
- Miao-Hui Song
- Lan Zhang
- Mu Yuan
- Yijun Liu
Large-scale data analysis services require efficient periodic model updates to adapt
to the possibly changing data distributions. Manually labeling all available samples
for task model updates is infeasible for a large sample scale. Active learning technique
is proposed to iteratively select subsets of the most informative samples for labeling.
From our experience of applying active learning in a real-world video analysis system,
we identify a previously overlooked bottleneck of time cost: data selection. Existing
active learning methods select data by estimating informativeness (e.g., output confidence)
over all unlabeled samples in each iteration. This data selection process can take
up to 42% of the time cost of end-to-end model updates in our system (totals include
the time for manual labeling, data selection, and model updates.). To address the
time cost bottleneck caused by data selection, we propose a new idea: proxy informativeness
estimation. We start with modeling the time cost of data selection, from which we
identify three key factors: unit estimation cost, the number of samples for estimation,
and the number of iteration rounds. The influence of the first two factors increases
cumulatively with the number of iteration rounds. Correspondingly, we design a proxy
estimator and a sample pooling method, respectively. Our proxy estimator is a lightweight
neural network for direct informativeness estimation to replace the role of the high-cost
task model, thus reducing the unit cost. And, our sample pooling method leverages
historical estimation results to narrow the scope of sample candidates. Based on the
above design, we develop ProxySampler, which can be integrated with various active learning approaches as a plug-in. Experimental
results show that integrating ProxySampler with state-of-the-art active learning methods can reduce the time cost by 53.6-83.3%
(a 2.15-6.01x speedup) when achieving the same accuracy.
Quantized Factor Identifiable Causal Effect Variational Autoencoder
- Sujeong Song
- Junghyo Sohn
- Eunsong Kang
- Heung-Il Suk
Causal inference involves determining how interventions affect outcomes and explaining
the underlying mechanisms, and it holds critical importance across various fields.
A key assumption in causal inference is that the measured covariates form a sufficient
adjustment set. However, this assumption often fails due to unobserved confounders,
as confounding mechanisms are rarely fully captured by measured covariates alone.
Recent research has attempted to address this challenge using variational autoencoders
(VAEs), but these approaches face practical limitations, including unidentifiability
and bias toward proxy variables. To overcome these issues, we propose a novel method
that incorporates quantized factor identifiability into VAEs for causal effect estimation.
This integration mitigates unidentifiability and reduces the dominance of proxy variables,
thereby enhancing consistency and accuracy in causal inference. Extensive experiments
on both simulated and real-world datasets demonstrate the robustness and effectiveness
of our method, establishing a new benchmark in deep causal modeling.
Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired
Modality Integration
- Tengwei Song
- Min Wu
- Yuan Fang
Molecular representation learning plays a crucial role in advancing applications such
as drug discovery and material design. Existing work leverages 2D and 3D modalities
of molecular information for pre-training, aiming to capture comprehensive structural
and geometric insights. However, these methods require paired 2D and 3D molecular
data to train the model effectively and prevent it from collapsing into a single modality,
posing limitations in scenarios where a certain modality is unavailable or computationally
expensive to generate. To overcome this limitation, we propose FlexMol, a flexible
molecule pre-training framework that learns unified molecular representations while
supporting single-modality input. Specifically, inspired by the unified structure
in vision-language models, our approach employs separate models for 2D and 3D molecular
data, leverages parameter sharing to improve computational efficiency, and utilizes
a decoder to generate features for the missing modality. This enables a multistage
continuous learning process where both modalities contribute collaboratively during
training, while ensuring robustness when only one modality is available during inference.
Extensive experiments demonstrate that FlexMol achieves superior performance across
a wide range of molecular property prediction tasks, and we also empirically demonstrate
its effectiveness with incomplete data. Our code and data are available at https://github.com/tewiSong/FlexMol.
GSTBench: A Benchmark Study on the Transferability of Graph Self-Supervised Learning
- Yu Song
- Zhigang Hua
- Yan Xie
- Jingzhe Liu
- Bo Long
- Hui Liu
Self-supervised learning (SSL) has shown great promise in graph representation learning.
However, most existing graph SSL methods are developed and evaluated under a single-dataset
setting, leaving their cross-dataset transferability largely unexplored and limiting
their ability to leverage knowledge transfer and large-scale pretraining, factors
that are critical for developing generalized intelligence beyond fitting training
data. To address this gap and advance foundation model research for graphs, we present
GSTBench, the first systematic benchmark for evaluating the transferability of graph SSL methods.
We conduct large-scale pretraining on ogbn-papers100M and evaluate five representative SSL methods across a diverse set of target graphs.
Our standardized experimental setup decouples confounding factors such as model architecture,
dataset characteristics, and adaptation protocols, enabling rigorous comparisons focused
solely on pretraining objectives. Surprisingly, we observe that most graph SSL methods
struggle to generalize, with some performing worse than random initialization. In
contrast, GraphMAE, a masked autoencoder approach, consistently improves transfer
performance. We analyze the underlying factors that drive these differences and offer
insights to guide future research on transferable graph SSL, laying a solid foundation
for the ''pretrain-then-transfer'' paradigm in graph learning. Our code is available
at https://github.com/SongYYYY/GSTBench.
DiRW: Path-Aware Digraph Learning for Heterophily
- Daohan Su
- Xunkai Li
- Zhenjun Li
- Yinping Liao
- Rong-Hua Li
- Guoren Wang
Recently, graph neural network (GNN) has emerged as a powerful representation learning
tool for graph-structured data. However, most approaches are tailored for undirected
graphs, neglecting the abundant information in the edges of directed graphs (digraphs).
In fact, digraphs are widely applied in the real world and confirmed to address heterophily
challenges. Despite recent advancements, existing spatial- and spectral-based DiGNNs
have limitations due to their complex learning mechanisms and reliance on high-quality
topology, resulting in low efficiency and unstable performance. To address these issues,
we propose Directed Random Walk (DiRW), a plug-and-play strategy for most spatial-based
DiGNNs and also an innovative model which offers a new digraph learning paradigm.
Specifically, it utilizes a direction-aware path sampler optimized from the perspectives
of walk probability, length, and number in a weight-free manner by considering node
profiles and topologies. Building upon this, DiRW incorporates a node-wise learnable
path aggregator for generalized node representations. Extensive experiments on 9 datasets
demonstrate that DiRW: (1) enhances most spatial-based methods as a plug-and-play
strategy; (2) achieves SOTA performance as a new digraph learning paradigm. The source
code and data are available at https://github.com/dhsiuu/DiRW.
Extracting Global Temporal Patterns Within Short Look-Back Windows for Traffic Forecasting
- Bo Sun
- Zhe Wu
- Zhiyuan Deng
- Li Su
- Qingfang Zheng
With the continuous expansion of urban areas, accurate and effective traffic forecasting
has become essential for intelligent urban traffic management. As traffic data inherently
exhibits temporal dynamics, modeling its temporal patterns is critical to improve
prediction performance. However, constrained by computational complexity, existing
methods rely primarily on short-term historical data, which is typically noisy and
limits the ability to capture global temporal patterns. To address this issue, we
propose a novel Dual-Stream Transformer model (DSformer) that effectively captures
global temporal patterns through a time-index model. To mitigate the impact of noise
in short look-back windows, DSformer explicitly learns a temporal matrix that encodes
structured temporal dependencies. Furthermore, we design a time-index loss that encourages
similar representations for adjacent time indices, thereby reducing error propagation
across time steps. In parallel, a historical-value stream is employed to model local
information. Finally, a self-adaptive learning module is constructed to flexibly and
accurately fuse global and local information. Extensive experiments on real-world
traffic forecasting tasks across ten diverse scenarios demonstrate that our method
consistently outperforms state-of-the-art baselines while maintaining competitive
efficiency. The code is available at https://github.com/sky836/DSFormer.git.
Discovering Group Collapser for Network Resilience
- Guozhang Sun
- Haoyuan Wang
- Yuhai Zhao
- Zhengkui Wang
- Yuan Li
- Xingwei Wang
Network resilience refers to the ability of a network to maintain its functionality
despite perturbations, where resilience/robustness is shown when a substantial proportion
of its nodes remain engaged even under changes. Such phenomenon is common in real-world
networks, such as computing power networks. Previous works demonstrate that the coreness
of a user/node effectively captures the dynamics of user engagement. However, most
existing works only consider changes in a single coreness value and thus fail to measure
the overall network resilience. Subsequent works are either inefficient or do not
consider coreness-decreased scenario. In this paper, we propose and study the collapsed
follower maximization problem, aiming to maximize the number of coreness-decreased
vertices by finding a group collapser (collapsing a set of vertices) with a given
budget. We prove that the problem is NP-hard and W[2]-hard parameterized by the budget
b. To address the problem, we first present a Greedy algorithm that iteratively finds
the best collapser in each of the budget b iterations. To further optimize the Greedy algorithm, we propose GreedyOpt, which
leverages the shell component structure to accelerate the computation of follower
for one collapser and prune the search space. Extensive experimental results on 8
real-world datasets show that the effectiveness and efficiency of our algorithms.
AMBER: Adaptive Meta Balanced Paradigm for Heterogeneous Graph-Based Knowledge Tracing
- Lifan Sun
- Zichen Yuan
- Ersheng Ni
- Weihua Cheng
- Xinyuan Song
- Linkun Dai
- Hongwei Jiang
- Sibo Xu
- Mengmeng Chen
- Yucen Zhuang
- Yongxin Ni
- Youhua Li
Knowledge Tracing (KT) is a fundamental task in personalized education, aiming to
predict student performance by modeling their evolving concept mastery. Recent state-of-the-art
approaches adopt multi-graph architectures to capture diverse concept and behavior
relations. However, such models often suffer from graph imbalance, where one graph
branch dominates training, undermining the benefits of structural integration. To
address this, we propose AMBER (Adaptive Meta-Balanced Ensemble Representation learning), a KT framework designed
to promote balanced learning across heterogeneous graph structures. AMBER introduces
an external dual-graph teacher to guide the learning of ensemble representations.
As the teacher itself may encode graph imbalance bias, we further incorporate a meta-distillation
strategy that adaptively adjusts the teacher using student feedback, amplifying signals
beneficial to underperforming branches. In addition, an adaptive graph rebalancing
strategy is introduced to balance the optimization of different graph branches in
real time, preventing dominance by any single structure. Experiments on three real-world
datasets show that AMBER consistently outperforms competitive baselines. By promoting
balanced optimization across graphs, AMBER enables more effective integration of heterogeneous
learning signals in KT, providing a robust and scalable solution for personalized
education. Code is available at https://github.com/AMBER2025KT/AMBER2025CIKM.
Hearing the Meaning, Not the Mess: Beyond Literal Transcription for Spoken Language
- Min Sun
- Ke Xu
- Jiarong Liu
- Jifan Yang
- Yan Fang
- Weizheng Wang
- Qipeng Xie
- Shuxin Zhong
- Kaishun Wu
With the rise of virtual communication and smart devices, speech has become the most
natural medium of interaction. Yet it remains intrinsically difficult: speech is fleeting,
unstructured, and disfluent, making key information prone to loss. Conventional Speech-to-Text
(STT) systems attempt to acoustically reconstruct what was said. However, their frame-level alignment and rigid token-by-token decoding break down
under noise, interruptions, or fragmentation. Humans, in contrast, readily grasp what was meant by exploiting syntax, discourse, pragmatics, and prosody. We argue for a paradigm
shift from acoustic reconstruction to semantic transduction: inferring meaning directly
from speech, abstracted from surface distortions. This shift raises two challenges:
(C1) the lack of anchors between audio and meaning, and (C2) the need to maintain compositional semantics. To address these, we introduce CogTrans,
a cognitively inspired speech-to-meaning framework. CogTrans tackles C1 through a Semantic Anchor Explorer, built on I-JEPA to capture higher-order regularities, prosodic rhythms, cross-frequency
coarticulation, discourse continuity-providing resilient semantic scaffolds under
noise and fragmentation. For C2, it designs a Lexical-Semantic Harmonizer that dynamically integrates these anchors with lexical embeddings; thereby preserving
fine-grained compositional fidelity in roles, order, and entities. Extensive experiments
show that CogTrans delivers consistent and substantial gains under challenging conditions.
On GigaSpeech, it achieves a 6.58% relative Word Error Rate (WER) reduction, and on the multilingual
VoxPopuli benchmark, the gain climbs to 12.97% at 10 dB noise-a regime where conventional models
typically collapse. Beyond literal accuracy, CogTrans also boosts semantic fidelity,
with a 3.40% increase in ROUGE-L and 3.45% in USE-Sim, ensuring transcripts remain
faithful not only in words but also in meaning. Together, these results underscore
that CogTrans is robust in noisy, unconstrained environments-precisely the conditions
where reliability matters most.
Dynamic Ensemble Member Selection for Data Stream Classification
- Yibin Sun
- Bernhard Pfahringer
- Heitor Murilo Gomes
- Albert Bifet
Ensemble methods are widely recognized for their effectiveness in data stream classification.
This paper introduces Dynamic Ensemble Member Selection (DEMS), a novel framework
that dynamically selects a subset of classifiers from an ensemble for each individual
prediction. DEMS ranks base learners based on estimated accuracy and predictive margin,
using only the top-K members for prediction, where K is optimized in a self-adaptive
manner. The proposed method significantly enhances predictive performance across various
state-of-the-art ensemble algorithms, including Streaming Random Patches, Adaptive
Random Forest, and Online Smooth Boost. Experimental results demonstrate that DEMS
consistently improves classification accuracy while maintaining a minimal runtime
overhead of just 11.66% compared to the original methods. This work highlights the
potential of DEMS in adapting to concept drift and optimizing ensemble diversity,
offering a practical solution for real-time data stream classification.
MPFormer: Adaptive Framework for Industrial Multi-Task Personalized Sequential Retriever
- Yijia Sun
- Shanshan Huang
- Linxiao Che
- Haitao Lu
- Qiang Luo
- Kun Gai
- Guorui Zhou
Modern industrial recommendation systems encounter a core chal- lenge of multi-stage
optimization misalignment: a significant semantic gap exists between the multi-objective
optimization paradigm (such as jointly optimizing click-through rate, watch duration,
and conversion rate) widely used in the ranking phase and the single-objective modeling
in the retrieve phase. Although the main-stream industry solution achieves multi-objective
coverage throughparallel multi-path single-objective retrieve, this approach leads
to linear growth of training and serving resources with the number of objectives and
has inherent limitations in handling loosely coupled objectives. This paper proposes
the MPFormer, a dynamic multi-task Transformer framework, which systematically addresses
the aforementioned issues through three innovative mechanisms. First, an objective-conditioned
transformer that jointly encodes user behavior sequences and multi-task semantics
through learnable attention modulation; second, personalized target weights are introduced
to achieve dynamic adjustment of retrieve results; finally, user personalization information
is incorporated into token representations and the Transformer structure to further
enhance the model's representation ability. This framework has been successfully integrated
into Kuaishou's short video recommendation system, stably serving over 400 million
daily active users. It significantly improves user daily engagement and system operational
efficiency. Practical deployment verification shows that, compared with traditional
solutions, it effectively optimizes the multi-objective retrieve iteration paradigm
while maintaining service response speed, providing a scalable multi-objective solution
for industrial recommendation systems.
On the Cross-type Homophily of Heterogeneous Graphs: Understanding and Unleashing
- Zhen Tao
- Ziyue Qiao
- Chaoqi Chen
- Zhengyi Yang
- Lun Du
- Qingqiang Sun
Homophily, the tendency of similar nodes to connect, is a fundamental phenomenon in
network science and a critical factor in the performance of graph neural networks
(GNNs). While existing studies primarily explore homophily in homogeneous graphs,
where nodes share the same type, real-world networks are often more accurately modeled
as heterogeneous graphs (HGs) with diverse node types and intricate cross-type interactions.
This structural diversity complicates the analysis of homophily, as traditional homophily
metrics fail to account for distinct label spaces across node types. To address this
limitation, we introduce the Cross-Type Homophily Ratio (CHR), a novel metric that
quantifies homophily based on the similarity of target information across different
node types. Additionally, we propose Cross-Type Homophily-guided Graph Editing (CTHGE), a novel method for improving heterogeneous graph neural networks (HGNNs) performance
by optimizing cross-type connectivity using Cross-Type Homophily Ratio. Extensive
experiments on five HG datasets with nine HGNNs validate the effectiveness of CTHGE,
which delivers a maximum relative performance improvement of over 25% for HGNNs on
node classification tasks, offering a fresh perspective on cross-type homophily in
HGs learning.
STKGNN: Scalable Spatio-Temporal Knowledge Graph Reasoning for Activity Recognition
- Gözde Ayşe Tataroğlu özbulak
- Yash Raj Shrestha
- Jean-Paul Calbimonte
The emergence of dynamic, high-volume data streams demands advanced reasoning frameworks
to capture complex spatio-temporal relationships that are essential for enabling contextual
understanding. However, current approaches often lack scalable and adaptable semantic
representations in dynamic and spatio-temporal scenarios. To answer this need, we
introduce a novel Spatio-Temporal Knowledge approach based on Graph Neural Networks
(STKGNN) for activity recognition. This framework performs graph-based reasoning over
semantically enriched Spatio-Temporal Knowledge Graphs (STKGs) constructed from open-source
video datasets. By leveraging these custom STKGs, we propose three advanced Graph
Neural Network (GNN) based architectures to recognize various activities. Accordingly,
we establish a comprehensive approach for spatio-temporal reasoning that adapts to
diverse Knowledge Graph structures by addressing adaptability, scalability, and temporal
complexities. This framework enhances activity recognition and provides a foundation
for wider dynamic or real-time applications in different domains including healthcare,
autonomous systems, video surveillance, and various other fields.
ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language
Models
- Dongsheng Tian
- Junzhe Jiang
- Kai Zhang
- Changchun Liu
- Yu Yuan
- Min Gao
- Enhong Chen
Electrocardiogram (ECG) diagnosis aims to automatically classify ECG recordings into
clinically meaningful categories, playing a vital role in medical decision-making.
Deep learning methods, while promising, demand extensive annotated data and lack interpretability.
Large Language Models (LLMs) offer potential in low-data scenarios and generating
interpretable outputs, yet their application to ECG diagnosis, especially leveraging
multimodal data (e.g., raw signals, derived features, and clinical knowledge), remains
underexplored. To address these challenges, we propose ECG-Doctor, an interpretable
and multimodal ECG diagnosis framework based on LLMs. ECG-Doctor comprises four key
components: (1) ECG Knowledge Acquisition Module, which integrates external medical
knowledge and Chain-of-Thought (CoT) reasoning to address the inability of LLMs to
follow standardized ECG diagnostic procedures; (2) ECG Feature Extraction Module,
which incorporates domain knowledge to overcome LLMs' limitations in comprehensively
understanding structured ECG features; (3) ECG Waveform Analysis Module, which introduces
time-series ECG models to equip LLMs with the capability to interpret and reason over
raw ECG signal morphologies; (4) KNN-based ECG Retrieval Module, which retrieves the
top-k most similar ECG samples and guides LLMs through in-context learning (ICL),
enabling them to differentiate and learn from variations across ECGs. The outputs
of these modules are aggregated and provided to the LLM as diagnostic context, enabling
ICL to perform comprehensive ECG diagnosis. This design effectively simulates the
diagnostic reasoning process of experienced electrocardiologists. Extensive experiments
on the PTB-XL dataset demonstrate that ECG-Doctor is compatible with various LLMs
and consistently outperforms existing baselines at both 100 Hz and 500 Hz sampling
rates, showcasing its strong versatility and robustness. Furthermore, ECG-Doctor provides
well-grounded diagnostic explanations, highlighting its superior interpretability.
X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents
- Lin Tian
- Xiuzhen Zhang
- Maria Myung-Hee Kim
- Jennifer Biggs
- Marian-Andrei Rizoiu
State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation
in coordinated information campaigns, posing threats to online discourse integrity.
While Large Language Models (LLMs) achieve strong performance on general natural language
processing (NLP) tasks, they struggle with subtle propaganda detection and operate
as ''black boxes'', providing no interpretable insights into manipulation strategies.
This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based
LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and
provide human-readable explanations for its decisions. X-Troll incorporates appraisal
theory and propaganda analysis through specialized LoRA adapters, using dynamic gating
to capture campaign-specific discourse patterns in coordinated information operations.
Experiments on real-world data demonstrate that our linguistically-informed approach
shows strong performance compared with both general LLM baselines and existing troll
detection models in accuracy while providing enhanced transparency through expert-grounded
explanations that reveal the specific linguistic strategies used by state-sponsored
actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.
Selective Mixup for Debiasing Question Selection in Computerized Adaptive Testing
- Mi Tian
- Kun Zhang
- Fei Liu
- Jinglong Li
- Yuxin Liao
- Chenxi Bai
- Zhengtao Tan
- Le Wu
- Richang Hong
Computerized Adaptive Testing is a widely used technology for evaluating examinees'
proficiency in online education platforms. By leveraging prior estimates of proficiency
to select questions and updating the estimates iteratively based on responses, it
enables personalized examinee modeling and has attracted substantial attention. Despite
this progress, most existing works focus primarily on improving proficiency estimation
accuracy, while overlooking the selection bias inherent in the adaptive process. Selection
bias arises because the question selection is strongly influenced by the estimated
proficiency, such as assigning easier questions to examinees with lower proficiency
and harder ones to examinees with higher proficiency. Since the selection depends
on prior estimation, this bias propagates into the diagnostic model, which is further
amplified during iterative updates, leading to misaligned and biased predictions.
Moreover, the imbalance in examinees' historical interactions often exacerbates bias
in diagnostic models. To address this issue, we propose a debiasing framework consisting
of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization.
First, we retrieve balanced examinees with relatively even distributions of correct
and incorrect responses and use them as neutral references for biased examinees. Then,
Mixup is applied between each biased examinee and its matched balanced counterpart
under label consistency. This augmentation enriches the diversity of bias-conflicting
samples and smooths selection boundaries. Finally, extensive experiments on two benchmark
datasets with multiple advanced diagnosis models have been conducted. The results
demonstrate that our method substantially improves the generalization ability of question
selection.
CoHN: Context-Aware Hawkes Graph Network for Temporal Knowledge Graph Reasoning
- Xiaowei Tian
- Xiaoyan Zhang
- Xiaofeng Du
- Tianbo Lu
Temporal Knowledge Graphs (TKGs) model dynamic events, and understanding temporal
evolution is crucial for effective reasoning. While existing methods leverage Graph
Neural Networks (GNNs) to model structural dependencies, they often rely on Recurrent
Neural Networks (RNNs) to process sequence of graph structures. They struggle to (1)
incorporate contextual information, (2) explicitly model long-term effects, and (3)
handle in-equidistant data. To address these challenges, we propose Context-Aware
Hawkes Graph Network (CoHN), a novel TKG reasoning approach based on the Hawkes process.
CoHN features a tailored conditional intensity function that models TKG event occurrences.
It is characterized by two additive terms: base intensity and historical influence,
representing spontaneous tendency and influence of past events at in-equidistant time
intervals. Firstly, we design a Contextual Encoder (CE) to encode contextual information
for all entities and compute the base intensity. We then present an attention-based
Evolutionary Encoder that captures local structural information and explicitly models
long-term dependencies across the TKG. A self-exciting fusion module further aggregates
historical evolutionary dependencies at all timestamps to quantify the final historical
influence. Extensive experiments on common benchmarks demonstrate the superiority,
robustness, and efficiency of our method.
SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation
- Nimeshika Udayangani
- Sarah Erfani
- Christopher Leckie
Out-of-Distribution (OOD) detection in semantic segmentation aims to localize anomalous
regions at the pixel level, advancing beyond traditional image-level OOD techniques
to better suit real-world applications such as autonomous driving. Recent literature
has successfully explored the adaptation of commonly used image-level OOD methods-primarily
based on classifier-derived confidence scores (e.g., energy or entropy)-for this pixel-precise
task. However, these methods inherit a set of limitations, including vulnerability
to overconfidence. In this work, we introduce SupLID, a novel framework that effectively
guides classifier-derived OOD scores by exploiting the geometrical structure of the
underlying semantic space, particularly using Linear Intrinsic Dimensionality (LID).
While LID effectively characterizes the local structure of high-dimensional data by
analyzing distance distributions, its direct application at the pixel level remains
challenging. To overcome this, SupLID constructs a geometrical coreset that captures
the intrinsic structure of the in-distribution (ID) subspace. It then computes OOD
scores at the superpixel level, enabling both efficient real-time inference and improved
spatial smoothness. We demonstrate that geometrical cues derived from SupLID serve
as a complementary signal to traditional classifier confidence, enhancing the model's
ability to detect diverse OOD scenarios. Designed as a post-hoc scoring method, SupLID
can be seamlessly integrated with any semantic segmentation classifier at deployment
time. Our results demonstrate that SupLID significantly enhances existing classifier-based
OOD scores, achieving state-of-the-art performance across key evaluation metrics,
including AUR, FPR, and AUP. Code is available at https://github.com/hdnugit/SupLID.
Neural Instrumented Factorization: Learning Dynamic Asset Pricing Factors and Loadings
through Characteristics Control
Asset pricing theory rests on the principle that differences in expected returns across
assets are driven by their exposures to systematic risk factors. Identifying the ''right''
factors-whether observable or latent-remains a central challenge in empirical finance.
Traditional latent factor models offer a parsimonious framework for summarizing information
from hundreds of observable firm characteristics; however, they are typically estimated
solely from return matrices, which limits their ability to capture time-varying, firm-specific
dynamics. This study proposes a novel framework---Neural Instrumented Factorization
(NeurIF)---that leverages firm characteristics as instruments to learn economically
meaningful and time-varying latent factors. NeurIF integrates spatial and temporal
attention mechanism to capture nonlinear relationships between firm characteristics
and asset returns, jointly learning both the latent factors and their dynamic loadings.
The model incorporates orthogonality constraints and deviation-based penalties to
ensure the interpretability and alignment of latent factors with observed firm characteristics.
Empirical evaluations on real-world asset pricing data reveal that NeurIF consistently
outperforms several state-of-the-art transformer based models in return prediction,
with improvements ranging from 1% to 18% in test data. Furthermore, the learned factor
loadings can generate statistically significant long-short portfolio returns and are
not subsumed by other observable factors. The embedded latent factors also exhibit
strong explanatory power across several cross-sectional asset pricing anomalies, highlighting
their economic relevance and robustness.
KIEPrompter: Leveraging Lightweight Models' Predictions for Cost-Effective Key Information
Extraction using Vision LLMs
- Lorenzo Vaiani
- Yihao Ding
- Luca Cagliero
- Jean Lee
- Paolo Garza
- Josiah Poon
- Soyeon Caren Han
Key information extraction (KIE) from visually rich documents, such as receipts and
forms, involves a deep understanding of textual, visual, and layout feature information.
Transformers fine-tuned for KIE achieve state-of-the-art performance but lack generality
and portability across different domains. In contrast, vision large language models
(VLLMs) offer higher flexibility and zero-shot capability but fall short with domain-specific
layout relations unless performing a resource-demanding supervised fine-tuning. To
reach the best compromise solution between lightweight models and VLLMs, we propose
KIEPrompter, a cost-effective LLM-based KIE approach that leverages the predictions
of lightweight models as external knowledge injected into VLLM prompts. By incorporating
these auxiliary predictions, VLLMs are guided to attend relevant multimodal content
without ad hoc training. The accuracy results achieved by KIEPrompter in three benchmark
document collections are superior to those of VLLMs in both zero-shot and layout-sensitive
scenarios. We compare various strategies for incorporating lightweight model predictions,
ranging from coarse-grained predictions without explicit confidence scores to fine-grained
per-element network logits. We also demonstrate that our approach is robust to the
absence of specific classes in trained lightweight models, as the VLLMs' pre-training
compensates for the limited generality of lightweight models.
FairAD: Computationally Efficient Fair Graph Clustering via Algebraic Distance
- Minh Phu Vuong
- Young-Ju Lee
- Iván Ojeda-Ruiz
- Chul-Ho Lee
Due to the growing concern about unsavory behaviors of machine learning models toward
certain demographic groups, the notion of 'fairness' has recently drawn much attention
from the community, thereby motivating the study of fairness in graph clustering.
Fair graph clustering aims to partition the set of nodes in a graph into k disjoint
clusters such that the proportion of each protected group within each cluster is consistent
with the proportion of that group in the entire dataset. It is, however, computationally
challenging to incorporate fairness constraints into existing graph clustering algorithms,
particularly for large graphs. To address this problem, we propose FairAD, a computationally
efficient fair graph clustering method. It first constructs a new affinity matrix
based on the notion of algebraic distance such that fairness constraints are imposed.
A graph coarsening process is then performed on this affinity matrix to find representative
nodes that correspond to k clusters. Finally, a constrained minimization problem is
solved to obtain the solution of fair clustering. Experiment results on the modified
stochastic block model and six public datasets show that FairAD can achieve fair clustering
while being up to 40 times faster compared to state-of-the-art fair graph clustering
algorithms.
Variety Is the Spice of Life: Detecting Misinformation with Dynamic Environmental
Representations
- Bing Wang
- Ximing Li
- Yiming Wang
- Changchun Li
- Jiaxu Cui
- Renchu Guan
- Bo Yang
The proliferation of misinformation across diverse social media platforms has drawn
significant attention from both academic and industrial communities due to its detrimental
effects. Accordingly, automatically distinguishing misinformation, dubbed as Misinformation
Detection (MD), has become an increasingly active research topic. The mainstream methods
formulate MD as a static learning paradigm, which learns the mapping between the content,
links, and propagation of news articles and the corresponding manual veracity labels.
However, the static assumption is often violated, since in real-world scenarios, the
veracity of news articles may vacillate within the dynamically evolving social environment.
To tackle this problem, we propose a novel framework, namely Misinformation detection
with Dynamic Environmental Representations (MISDER). The basic idea of MISDER lies
in learning a social environmental representation for each period and employing a
temporal model to predict the representation for future periods. In this work, we
specify the temporal model as the LSTM model, continuous dynamics equation, and pre-trained
dynamics system, suggesting three variants of MISDER, namely MISDER-LSTM, MISDER-ODE,
and MISDER-PT, respectively. To evaluate the performance of MISDER, we compare it
to various MD baselines across 2 prevalent datasets, and the experimental results
can indicate the effectiveness of our proposed model.
SPARK: Adaptive Low-Rank Knowledge Graph Modeling in Hybrid Geometric Spaces for Recommendation
- Binhao Wang
- Yutian Xiao
- Maolin Wang
- Zhiqi Li
- Tianshuo Wei
- Ruocheng Guo
- Xiangyu Zhao
Knowledge Graphs (KGs) enhance recommender systems but face challenges from inherent
noise, sparsity, and Euclidean geometry's inadequacy for complex relational structures,
critically impairing representation learning, especially for long-tail entities. Existing
methods also often lack adaptive multi-source signal fusion tailored to item popularity.
This paper introduces SPARK, a novel multi-stage framework systematically tackling
these issues. SPARK first employs Tucker low-rank decomposition to denoise KGs and
generate robust entity representations. Subsequently, an SVD-initialized hybrid geometric
GNN concurrently learns representations in Euclidean and Hyperbolic spaces; the latter
is strategically leveraged for its aptitude in modeling hierarchical structures, effectively
capturing semantic features of sparse, long-tail items. A core contribution is an
item popularity-aware adaptive fusion strategy that dynamically weights signals from
collaborative filtering, refined KG embeddings, and diverse geometric spaces for precise
modeling of both mainstream and long-tail items. Finally, contrastive learning aligns
these multi-source representations. Extensive experiments demonstrate SPARK's significant
superiority over state-of-the-art methods, particularly in improving long-tail item
recommendation, offering a robust, principled approach to knowledge-enhanced recommendation.
Implementation code is anonymously online. https://github.com/Applied-Machine-Learning-Lab/SPARK.
Full-Atom Protein-Protein Interaction Prediction via Atomic Equivariant Attention
Network
- Chunchen Wang
- Cheng Yang
- Wenchuan Yang
- Le Song
- Chuan Shi
Protein-protein Interaction (PPI) prediction, which aims to identify the interactions
between proteins within a biological system, is an important problem in understanding
disease mechanisms and drug discovery. Recently, Equivariant Graph Neural Networks
(E3-GNNs) are advanced computational models that provide a powerful solution for accurately
predicting PPIs by preserving the geometric integrity of protein interactions. However,
most E3-GNNs model protein interactions at the residue level, potentially neglecting
critical atomic details and side-chain conformations. In this paper, we propose a
novel model, MEANT, designed to adaptively extract atom-level geometric information
from varying numbers of atoms within different residues for PPI prediction. Specifically,
we define a full-atom graph that contains atomic geometry and guides the message passing
under the structure of residues. We also design a geometric relation extractor to
integrate geometric information from different residues and adaptively handle variations
in the number of atoms within each residue. Finally, we adopt the attention mechanism
to update the residue representation and the atomic coordinates within a residue.
Experimental results show that our proposed model, MEANT, significantly outperforms
state-of-the-art methods on three typical PPI prediction tasks. Our code and data
are available on GitHub at https://github.com/BUPT-GAMMA/MEANT.
AdaHet-MKD: An Adaptive Heterogeneous Multi-teacher Knowledge Distillation for Medical
Image Analysis
- Helin Wang
- Wei Du
- Ning Liu
- Qian Li
- Yanyu Xu
- Lizhen Cui
Contrastive Language-Image Pre-training (CLIP) has emerged as an effective framework
for multi-modal representation learning, achieving notable success in diverse tasks
such as medical image analysis. CLIP's growing prominence in medical image applications
is restricted by its significant computational demands, creating implementation challenges
in resource-constrained clinical environments. While knowledge distillation offers
an effective approach for model compression with preserved accuracy, existing methods
suffer from two fundamental limitations. Firstly, existing methods focus on learning
better information from single models while ignoring the fact that student models
can generalize well under the guidance of multiple teachers. Secondly, they overlook
the complementary information in the CLIP model where the text encoder and image encoder
can be leveraged as heterogeneous information to teach one single modality. To tackle
these challenges, we propose an Adaptive Heterogeneous Multi-teacher Knowledge Distillation (AdaHet-MKD) framework for effective knowledge transfer across heterogeneous text-image models
and among multiple teacher models. The key innovations include: (i) adaptively determining
the contribution of each teacher model to specific instances, thereby generating integrated
soft logits, and (ii) enabling the student model to operate independently of the teacher
model's architecture, which enhances flexibility in teacher-student pairings. Experimental
evaluations on publicly available medical datasets demonstrate that our approach has
achieved the state-of-the-art performance compared to baselines.
GegenNet: Spectral Convolutional Neural Networks for Link Sign Prediction in Signed
Bipartite Graphs
- Hewen Wang
- Renchi Yang
- Xiaokui Xiao
Given a signed bipartite graph (SBG) G with two disjoint node sets U and V, the goal of link sign prediction is
to predict the signs of potential links connecting U and V based on known positive
and negative edges in G. The majority of existing solutions towards link sign prediction
mainly focus on unipartite signed graphs, which are sub-optimal due to the neglect of node heterogeneity and
unique bipartite characteristics of SBGs. To this end, recent studies adapt graph neural networks to SBGs by introducing message-passing schemes for both inter-partition (U x V) and
intra-partition (U x U or V x V) node pairs. However, the fundamental spectral convolutional
operators were originally designed for positive links in unsigned graphs, and thus,
are not optimal for inferring missing positive or negative links from known ones in
SBGs.
Motivated by this, this paper proposes GegenNet, a novel and effective spectral convolutional
neural network model for link sign prediction in SBGs. In particular, GegenNet achieves
enhanced model capacity and high predictive accuracy through three main technical
contributions: (i) fast and theoretically grounded spectral decomposition techniques
for node feature initialization; (ii) a new spectral graph filter based on the Gegenbauer
polynomial basis; and (iii) multi-layer sign-aware spectral convolutional networks
alternating Gegenbauer polynomial filters with positive and negative edges. Our extensive
empirical studies reveal that GegenNet can achieve significantly superior performance
(up to a gain of 4.28% in AUC and 11.69% in F1) in link sign prediction compared to
11 strong competitors over 6 benchmark SBG datasets.
Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
- Hongtao Wang
- Taiyan Zhang
- Renchi Yang
- Jianliang Xu
Text clustering aims to automatically partition a collection of documents into coherent
groups based on their linguistic features. In the literature, this task is formulated
either as metric clustering over pre-trained text embeddings or as graph clustering
based on pairwise similarities derived from an oracle, e.g., a large machine learning
model. Recent advances in large language models (LLMs) have significantly improved
this field by providing high-quality contextualized embeddings and accurate semantic
similarity estimates. However, leveraging LLMs at scale introduces substantial computational
and financial costs due to the large number of required API queries or inference calls.
To address this issue, we propose Cequel, a cost-effective framework that achieves
accurate text clustering under a limited budget of LLM queries. At its core, Cequel
constructs must-link and cannot-link constraints by selectively querying LLMs on informative
text pairs or triplets, identified via our proposed algorithms, EdgeLLM and TriangleLLM.
These constraints are then utilized in a weighted constrained clustering algorithm
to form high-quality clusters. Specifically, EdgeLLM and TriangleLLM employ carefully
designed greedy selection strategies and prompting techniques to identify and extract
informative constraints efficiently. Experiments on multiple benchmark datasets demonstrate
that Cequel consistently outperforms existing methods in unsupervised text clustering
under the same query budget.
TableTime: Reformulating Time Series Classification as Training-Free Table Understanding
with Large Language Models
- Jiahao Wang
- Mingyue Cheng
- Qingyang Mao
- Yitong Zhou
- Daoyu Wang
- Qi Liu
- Feiyang Xu
- Xin Li
Large language models (LLMs) have shown promise in multivariate time series classification
(MTSC). To effectively adapt LLMs for MTSC, it is crucial to generate comprehensive
and informative data representations. Most methods utilizing LLMs encode numerical
time series into the model's latent space, aiming to align with the semantic space
of LLMs for more effective learning. Despite effectiveness, we highlight three limitations
that these methods overlook: (1) they struggle to incorporate temporal and channel-specific
information, both of which are essential components of multivariate time series; (2)
aligning the learned representation space with the semantic space of the LLMs proves
to be a significant challenge; (3) they often require task-specific retraining, preventing
training-free inference despite the generalization capabilities of LLMs. To bridge
these gaps, we propose TableTime, which reformulates MTSC as a table understanding
task. Specifically, TableTime introduces the following strategies: (1) utilizing tabular
form to unify the format of time series, facilitating the transition from the model-centric
approach to the data-centric approach; (2) representing time series in text format
to facilitate seamless alignment with the semantic space of LLMs; (3) designing a
knowledge-task dual-driven reasoning framework, TableTime, integrating contextual
information and expert-level reasoning guidance to enhance LLMs' reasoning capabilities
and enable training-free classification. Extensive experiments conducted on 10 publicly
available benchmark datasets from the UEA archive validate the substantial potential
of TableTime to be a new paradigm for MTSC. The code is publicly available. https://github.com/realwangjiahao/TableTime.
Weakly Supervised Fine-grained Span-Level Framework for Chinese Radiology Report Quality
Assurance
- Kaiyu Wang
- Lin Mu
- Zhiyao Yang
- Ximing Li
- Xiaotang Zhou
- Wanfu Gao
- Huimao Zhang
Quality Assurance (QA) for radiology reports refers to judging whether the junior
reports (written by junior doctors) are qualified. The QA scores of one junior report
are given by the senior doctor(s) after reviewing the image and junior report. This
process requires intensive labor costs for senior doctors. Additionally, the QA scores
may be inaccurate for reasons like diagnosis bias, the ability of senior doctors,
and so on. To address this issue, we propose a Span-level Quality Assurance EvaluaTOR
(Sqator) to mark QA scores automatically. Unlike the common document-level semantic
comparison method, we try to analyze the semantic difference by exploring more fine-grained
text spans. Specifically, Sqator measures QA scores by measuring the importance of
revised spans between junior and senior reports, and outputs the final QA scores by
merging all revised span scores. We evaluate Sqator using a collection of 12,013 radiology
reports. Experimental results show that Sqator can achieve competitive QA scores.
Moreover, the importance scores of revised spans can be also consistent with the judgments
of senior doctors.
ACMCG: A Cost-effective Active Clustering with Minimal Constraint Graph
- Qiu-Yu Wang
- Wen-Bo Xie
- Tao Deng
- Tian Zou
- Xuan-Lin Zhu
- Xun Fu
- Xin Wang
Active clustering enhances traditional semi-supervised clustering by introducing machine-led
interaction, where informative constraints are dynamically selected and posed to humans.
This enables goal-driven interaction and reduces the number of required constraints
for achieving high-quality clustering. In this paper, we propose a newly designed
Active Clustering framework with Minimal Constraint Graph (ACMCG). ACMCG operates on two cooperating tailored sparse graphs: a tree-structured
graph (clustering tree) representing the nested clustering result, and a minimal constraint
graph that supports constraint deduction during iterative refinement. In each refinement
round, (a) the most suspicious edge in the tree is identified for constraint verification;
(b) if a cannot-link constraint is confirmed, a pruning-and-grafting approach is performed
to refine the clustering tree, guided by our proposed constraint deduction strategies;
(c) the constraint is either deduced from the minimal constraint graph using transitive
and probabilistic deduction, or obtained via user interaction when deduction fails.
Extensive experiments across diverse domains demonstrate that ACMCG consistently outperforms
both classical and state-of-the-art methods in accuracy, while significantly reducing
the number of user-provided constraints and maintaining low computational cost, highlighting
its cost-effectiveness in real-world applications.
Strong Forgetting for ALCQ-Ontologies
Forgetting is a non-standard reasoning procedure used to refine an ontology into a
sub-signature by eliminating symbols not included in this subset, addressing fundamental
challenges in knowledge management where ontology refinement and reuse are crucial
for efficient information processing. It has two forms: weak forgetting (aka uniform
interpolation), which preserves entailments within the source language, and strong
forgetting, which additionally ensures model preservation modulo the eliminated symbols.
This makes the latter significantly more challenging to compute. In this paper, we
present the first method for strong role forgetting in description logics with qualified
number restrictions (Q ). In particular, the method takes ALCQ-ontologies as input, yielding output either in ALCQ or in ALCQ(∇) by further incorporating the universal role ∇ to avoid information loss. This
preserves model-theoretic properties crucial for applications such as modal correspondence
theory and second-order quantifier elimination. While the method guarantees termination
and soundness, its completeness is inherently constrained by the undecidability of
strong forgetting. However, empirical evaluations on the Oxford-ISG and BioPortal
benchmarks show that this theoretical limitation barely impedes practical utility,
with experiment results demonstrating superb success rates and remarkably high efficiency.
Spatio-Temporal Wavelet Enhanced Attention Mamba for Stock Price Forecasting
- Shurui Wang
- Wenbo Yan
- Ying Tan
Stock price forecasting remains a critical challenge due to market non-stationarity
and the influence of multiple factors. Existing studies apply frequency domain analysis
methods to mitigate the impacts of non-stationarity by decoupling high- and low-frequency
variation patterns. However, these approaches primarily focus on single series decomposition
while neglecting cross frequency interactions among different stocks. Moreover, as
a key indicator of overall market trends, current methods inadequately utilize market
index information. In this paper, we propose STEAM, a Spatio-Temporal Wavelet Enhanced Attention Mamba model. We introduce Discrete Wavelet Transform (DWT) to disentangle multi-frequency
temporal features and propose Wavelet Enhanced Attention (WEA) to capture cross frequency
spatial dependencies, effectively leveraging both local and global inter-stock relationships.
To extract the synergistic spatio-temporal dependencies in stock data, AMamba module
is designed that integrates WEA into the Mamba-2 architecture. Additionally, to further
enhance the model's perception of macro-market conditions, we incorporate market index
as a prefix, guiding predictions with holistic market information in both spatial
and temporal dependencies learning. Extensive experiments across multiple national
stock markets demonstrate that STEAM achieves state-of-the-art forecasting performance.
GraphRCG: Self-Conditioned Graph Generation
- Song Wang
- Zhen Tan
- Xinyu Zhao
- Tianlong Chen
- Huan Liu
- Jundong Li
Graph generation aims to create new graphs that closely align with a target graph
distribution. Existing works often implicitly capture this distribution by aligning
the output of a generator with each training sample. As such, the overview of the
entire distribution is not explicitly captured and used for graph generation. In contrast,
in this work, we propose a novel self-conditioned graph generation framework designed
to explicitly model graph distributions and employ these distributions to guide the
generation process. We first perform self-conditioned modeling to capture the graph
distributions by transforming each graph sample into a low-dimensional representation
and optimizing a representation generator to create new representations reflective
of the learned distribution. Subsequently, we leverage these bootstrapped representations
as self-conditioned guidance for the generation process, thereby facilitating the
generation of graphs that more accurately reflect the learned distributions. We conduct
extensive experiments on generic and molecular graph datasets. Our framework, GraphRCG,
demonstrates superior performance over existing state-of-the-art graph generation
methods in terms of graph quality and fidelity to training data.
LCHGNN: Towards Distributed Hypergraph Neural Network Training Based on Communication
Graphs with Lightweight Communication Optimization
- Taibo Wang
- Yu Gu
- Xinning Cui
- Zhen Song
- Xiaohua Li
- Fangfang Li
Hypergraph Neural Networks (HGNNs) build on Graph Neural Networks (GNNs) by using
hyperedges to capture complex, high-order relationships in data. However, training
HGNNs on large hypergraphs is limited by computational and memory bottlenecks on a
single machine. To overcome this, we propose LCHGNN, a distributed training method
based on a new data structure called the communication graph, which simplifies hypergraph
communication by representing cut hyperedges as vertices for structured message passing.
LCHGNN employs a vertex-centric, hyperedge-replication-based storage scheme and introduces
specialized forward and backward propagation mechanisms tailored for distributed execution.
To mitigate communication overhead, We propose a lightweight optimization strategy
that employs full synchronization in the initial round, followed by lightweight synchronization
in subsequent rounds. Additionally, we present a learnable semi-supervised synchronization
(LSS) aggregation mechanism for adaptive hyperedge selection. Extensive experiments
on benchmark datasets demonstrate that LCHGNN preserves training accuracy while substantially
reducing communication costs and enhancing scalability. This work addresses a critical
gap in distributed HGNN research by delivering a communication-efficient and scalable
training method, thereby facilitating the application of hypergraph learning to large-scale
problems.
MFAE: Multimodal Feature Adaptive Enhancement for Fake News Video Detection
- Wenhao Wang
- Mingxin Li
- Jiao Qiao
- Haotong Du
- Xianghua Li
- Chao Gao
- Zhen Wang
With the rapid global growth of short video platforms, the spread of fake news has
become increasingly prevalent, creating an urgent demand for effective automated detection
methods. Current approaches typically rely on feature extractors to gather information
from multiple modalities and then generate predictions through classifiers. However,
these methods often fail to fully utilize the complex information across all modalities
and overlook the potential for video manipulation, limiting their overall performance.
To tackle these issues, MFAE is proposed, a novel framework for Multimodal Feature
Adaptive Enhancement for Fake News Video Detection. The framework starts by extracting
semantic and emotional features from the news, which are the basis for generating
coarse multimodal representations. These representations are further refined through
Adaptive Enhancement, a module specifically designed to strengthen the visual and
audio modalities. Subsequently, spatial and temporal features are extracted separately,
with temporal features undergoing additional refinement via a Temporal Enhancement
module. The final result is obtained by feeding the individually enhanced features
into the multimodal feature integration module for interaction Comprehensive experiments
on two benchmark datasets highlight the exceptional performance of MFAE in detecting
fake news on short video platforms. Specifically, the method achieves accuracy improvements
of 2.21% and 4.35% on FakeSV and FakeTT, respectively.
WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting
- Xiaojian Wang
- Chaoli Zhang
- Zhonglong Zheng
- Yunliang Jiang
Time series forecasting has various applications, such as meteorological rainfall
prediction, traffic flow analysis, financial forecasting, and operational load monitoring
for various systems. Due to the sparsity of time series data, relying solely on time-domain
or frequency-domain modeling limits the model's ability to fully leverage multi-domain
information. Moreover, when applied to time series forecasting tasks, traditional
attention mechanisms tend to over-focus on irrelevant historical information, which
may introduce noise into the prediction process, leading to biased results. We proposed
WDformer, a wavelet-based differential Transformer model. This study employs the wavelet
transform to conduct a multi-resolution analysis of time series data. By leveraging
the advantages of joint representation in the time-frequency domain, it accurately
extracts the key information components that reflect the essential characteristics
of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing
the attention mechanism to capture relationships between multiple variables. When
performing attention calculations, we introduced the differential attention mechanism,
which computes the attention score by taking the difference between two separate softmax
attention matrices. This approach enables the model to focus more on important information
and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple
challenging real-world datasets, demonstrating its accuracy and effectiveness. Code
is available at https://github.com/xiaowangbc/WDformer.
MGFSG-EE: A Method based on Multi-grained Fusion and Scene Graph Enhancement for Event
Extraction
- Xiaoyu Wang
- Tao Sun
- Gengchen Liu
- Zhi Yang
- Jiahui Liu
- Zimeng Xu
The Multimedia Event Extraction (MEE) task, as a core task in the field of event analysis,
has seen many benefits in downstream applications. Existing MEE methods focus on the
fusion of co-occurring information in images and text during the fusion process, failing
to model the background information correlation between images and text, making it
difficult to extract events and arguments in complex scenarios. Meanwhile, the neglect
of utilizing the interaction information between objects in images leads to the loss
of event information and incomplete argument extraction. To address these issues,
a novel method based on Multi-grained Fusion and Scene Graph Enhancement (MGFSG-EE)
has been proposed, that introduces a Multi-grained Fusion Module, which captures co-occurring
information between image and text through dynamic screening of cross-modal features
for coarse-grained fusion; builds a Multimodal Graph and utilizes graph convolutional
neural networks (GCN) to achieve fine-grained interaction fusion at the vector level,
mining semantic associations and contextual information between image and text. Moreover,
MGFSG-EE constructs a Scene Graph to model the spatial and semantic relationships
among objects and uses GCN to learn rich event representations. In M2 E2 benchmark dataset, MGFSG-EE outperforms the existing SOTA baselines, particularly
in visual and multimedia event tasks, the F1 score for the event trigger task is improved
by 3.2% and 2.3%, respectively; and for event argument extraction task it is improved
by 2.7% and 2.8%, respectively, verifying the effectiveness of the proposed method.
Rethinking Lipschitzness Data-free Backdoor Defense
- Xinyi Wang
- Zhiyu Zhu
- Zhibo Jin
- Huaming Chen
- Teng Joon Lim
Deep Neural Networks (DNNs) have demonstrated remarkable success across various applications,
yet some studies reveal their vulnerability to backdoor attacks, where attackers manipulate
models under specific conditions using triggers. It significantly compromise the model
integrity. Addressing this critical security issue requires robust defence mechanisms
to ensure the reliability of DNN models. However, most existing defence mechanisms
heavily rely on specialized defence datasets, which are often difficult to obtain
due to data privacy and security concerns. This highlights the urgent need for effective
data-free defence strategies. In this work, we propose Lipschitzness Precise Pruning
(LPP), a novel data-free backdoor defence algorithm that leverages the properties
of Lipschitz function to detect and mitigate backdoor vulnerabilities by pruning neurons
with strong backdoor correlations while fine-tuning unaffected neurons. Our approach
optimizes the computation of the Lipschitz constant using dot product properties,
allowing for efficient and precise identification of compromised neurons without the
need of clean defence data. This method addresses the limitations of existing data-free
defences and extends the scope of backdoor mitigation to include fully connected layers,
ensuring comprehensive protection of DNN models. As our approach does not require
data exchange, it can be implemented efficiently and effectively in diverse environments.
Extensive experiments demonstrate that LPP outperforms state-of-the-art defence approaches
without the need for additional defence datasets. We release our code at: https://github.com/LMBTough/LPP
Transformers are Good Clusterers for Lifelong User Behavior Sequence Modeling
- Xingmei Wang
- Shiyao Wang
- Wuchao Li
- Jiaxin Deng
- Song Lu
- Defu Lian
- Guorui Zhou
Modeling user long-term behavior sequences is critical for enhancing Click-Through
Rate (CTR) prediction. Existing methods typically employ two cascaded search units-General
Search Unit (GSU) for rapid retrieval and Exact Search Unit (ESU) for precise modeling-to
balance efficiency and effectiveness. However, they are constrained to recent behaviors
due to computational limitations. Clustering user behaviors offers a potential solution,
enabling GSU to access lifelong behaviors while maintaining inference efficiency,
but current clustering approaches often lack generalizability, or fail to remain effective
in high-dimensional data due to non-end-to-end clustering and recommendation. Given
that centroids in clustering group similar data points based on proximity, similar
to how queries function in transformers, we can integrate the learning of queries
with CTR tasks in an end-to-end manner, shifting clustering from meaningless Euclidean
distances to meaningful semantic distances. Therefore, we propose C-Former, a transformer-based clustering model specifically designed for modeling lifelong
behavior sequences. The C-Former encoder leverages a group of learnable clustering anchor points that access the lifelong
user behaviors to extract personalized interests. Then, the C-Former decoder reconstructs lifelong user behaviors based on the compact output of the encoder.
The reconstruction and orthogonal loss ensure that centroids are informative and diverse
in capturing user preferences. Clustering is further guided by supervisory signals
from CTR, establishing an end-to-end framework. The proposed C-Former achieves linear
time complexity in training with respect to sequence length and significantly reduces
inference latency by directly utilizing cached centroids. Experiments on four benchmark
datasets demonstrate the effectiveness of C-Former for lifelong user behavior sequence
modeling. The code is available at https://github.com/pepsi2222/C-Former.
CLUE: Using Large Language Models for Judging Document Usefulness in Web Search Evaluation
- Xingzhu Wang
- Erhan Zhang
- Yiqun Chen
- Jinghan Xuan
- Yucheng Hou
- Yitong Xu
- Ying Nie
- Shuaiqiang Wang
- Dawei Yin
- Jiaxin Mao
The widely adopted Cranfield paradigm fails to adequately capture user satisfaction
due to a weak relevance-satisfaction correlation. Additionally, constructing test
collections incurs high relevance annotation costs. To address these two limitations,
we aim to explore the use of large language models (LLMs) to generate multilevel usefulness
labels. We propose CLUE, a user-centric evaluation method that explicitly incorporates
users' search context and behavior information into LLMs. Inspired by ordinal regression,
it employs a cascade structure tailored for multilevel usefulness judgments. Our study
shows that using CLUE, LLMs can effectively assess usefulness when provided with search
context and behavior, outperforming third-party labeling methods. We also conduct
ablation studies to explore the impact of each component in CLUE. Finally, we utilize
the usefulness labels generated by CLUE to predict user satisfaction. Real-world experiments
reveal that incorporating CLUE's usefulness labels significantly enhances the performance
of the satisfaction prediction model.
Revisiting Trajectories to Road: A New Diffusion Model and A New Dataset with 1,000,000,000
Points
- Yang Wang
- Miaomiao Li
- Jiazhi Ni
Rapid urban growth and road network evolution make it increasingly difficult to maintain
accurate digital maps. Traditional manual or satellite-based updates are often delayed
or insufficiently detailed. The trajectory-to-road (T2R) task addresses these limitations
by leveraging GPS trajectory data to reconstruct up-to-date road networks, providing
a scalable solution for navigation, ride-hailing, and urban planning. Existing T2R
methods face significant challenges due to their reliance on numerous statistical
features and limited generative capabilities. Additionally, current datasets are often
outdated and come from a single mobility source, leading to biased urban dynamics
and poor generalizability. To address these issues, we introduce DiffusionT2R, the
first diffusion-based framework for T2R. DiffusionT2R leverages three key innovations:
a multi-channel trajectory representation to provide fine-grained conditioning for
guiding the denoising process; a Dual-Level Mixture of Filters that enhances feature
extraction at both local and global scales; and a consistency constraint to ensure
spatial alignment with input trajectories, preserving road network realism. We also
present the largest available trajectory dataset with up-to-date road networks, diverse
mobility patterns and high-quality filtering. Experimental results show that DiffusionT2R
outperforms existing methods, delivering accurate, realistic, and generalizable road
networks with improved robustness in real-world scenarios. The dataset TXBJ is available
at https://github.com/ywyangwang/TXBJ.
Generative Data Augmentation in Graph Contrastive Learning for Recommendation
- Yansong Wang
- Qihui Lin
- Junjie Huang
- Tao Jia
Recommendation systems have become indispensable in various online platforms, from
e-commerce to streaming services. A fundamental challenge in this domain is learning
effective embeddings from sparse user-item interactions. While contrastive learning
has recently emerged as a promising solution to this issue, generating augmented views
for contrastive learning through most existing random data augmentation methods often
leads to the alteration of original semantic information. In this paper, we propose
a novel framework, GDA4Rec (Generative Data Augmentation in graph contrastive learning for Recommendation) to generate high-quality augmented views and provide robust self-supervised
signals. Specifically, we employ a noise generation module that leverages deep generative
models to approximate the distribution of original data for data augmentation. Additionally,
GDA4Rec further extracts an item complement matrix to characterize the latent correlations
between items and provide additional self-supervised signals. Lastly, a joint objective
that integrates recommendation, data augmentation and contrastive learning is used
to enforce the model to learn more effective and informative embeddings. Extensive
experiments are conducted on three public datasets to demonstrate the superiority
of the model. The code is available at: https://github.com/MrYansong/GDA4Rec.
Towards Reliable GNNs: Adversarial Calibration Learning for Confidence Estimation
- Yilong Wang
- Jiahao Zhang
- Tianxiang Zhao
- Suhang Wang
Graph neural networks (GNNs) have achieved strong predictive performance across a
range of tasks, yet they often exhibit poor confidence calibration-where the predicted
confidence scores do not accurately reflect the true likelihood of correctness. This
shortcoming raises concerns about their reliability in critical domains such as fraud
detection and risk assessment, where well-calibrated predictions are essential for
sound decision-making. Although several calibration methods have been proposed for
GNNs, our experiments reveal that they tend to focus on global calibration while failing
to generalize across different node groups, such as those defined by degree, class,
or local structural patterns. In some cases, these methods even degrade calibration
performance compared to the original uncalibrated models. To address this limitation,
we introduce AdvCali, a novel framework that adaptively improves calibration across
diverse node groups. AdvCali employs adversarial training to automatically identify
miscalibrated groups and incorporates a differentiable Group Expected Calibration
Error (ECE) loss to refine confidence estimates within them. This enables the model
to adjust its calibration strategy dynamically, without relying on prior knowledge
of which node groups are miscalibrated. Extensive experiments on real-world datasets
show that AdvCali not only improves global calibration but also significantly enhances
calibration within groups defined by feature similarity, graph topology, and connectivity
patterns, outperforming existing approaches.
Understanding the Embedding Models on Hyper-relational Knowledge Graph
- Yubo Wang
- Shimin Di
- Zhili Wang
- Haoyang Li
- Fei Teng
- Hao Xin
- Lei Chen
Recently, Hyper-relational Knowledge Graphs (HKGs) have been proposed as an extension
of traditional Knowledge Graphs (KGs) to better represent real-world facts with additional
qualifiers. As a result, researchers have attempted to adapt classical Knowledge Graph
Embedding (KGE) models for HKGs by designing extra qualifier processing modules. However,
it remains unclear whether the superior performance of Hyper-relational KGE (HKGE)
models arises from their base KGE model or the specially designed extension module.
In this paper, we data-wise convert HKGs to KG format using decomposition methods
and then evaluate several classical KGE models' performance on HKGs. Our results show
that some KGE models achieve comparable performance to HKGE models. Upon further analysis,
we find that the decomposition methods alter the original HKG topology and fail to
fully preserve HKG information. Moreover, we observe that current HKGE models are
either insufficient in capturing the graph's long-range dependency or struggle to
integrate main-triple and qualifier information due to the information compression
issue. To further justify our findings and provide a direction for HKGE research,
we propose FormerGNN, which employs a qualifier integrator to preserve the original
HKG topology, a GNN-based graph encoder to capture the graph's long-range dependencies,
and an improved approach for integrating main-triple and qualifier information to
mitigate compression issues. Our experimental results demonstrate that FormerGNN outperforms
existing HKGE models.
PRIMA: Privacy preserving Multi-dimensional Analytic Approach
- Yufei Wang
- Xiang Cheng
- Pengfei Zhang
- Anxing Wei
Sum query is an important and fundamental operator for online analytical processing.
In this paper, we focus on the process of answering sum queries over data cube, each
of which consists of a collection of cuboids, while satisfying differential privacy
(DP). Existing works fail to process the sum queries in online analytical processing
with high utility due to sum queries' high sensitivity and the noise aggregation:
constructing a base cuboid requires the data curator to answer a workload of linear
sum queries under DP in advance, whose sensitivity will result in a large amount of
DP noise, and the noise will finally be aggregated when constructing the remaining
cuboids. To this end, we present a Differentially PRIvate Multi-dimensional Analytic Approach (PRIMA). In PRIMA, we propose a Symmetric Bounded Sum Query Processing
Method (SBS) which reduces the sensitivity of sum queries by bounding both the maximum
and minimum contribution of each record in the data table in a symmetricaly manner.
Moreover, we propose a Hypothesis Testing based Prefix Sum Computing Method (SCOPE)
to compute a base prefix-sum cuboid based on hypothesis testing. By employing the
base prefix-sum cuboid, any remaining cuboid can be constructed with constant pieces
of DP noise aggregated. We conduct experiments on both real-world and synthetic datasets.
Experimental results confirm the effectiveness of PRIMA over existing works.
ORCAS: Obfuscation-Resilient Binary Code Similarity Analysis using Dominance Enhanced
Semantic Graph
- Yufeng Wang
- Yuhong Feng
- Yixuan Cao
- Haoran Li
- Haiyue Feng
- Yifeng Wang
Binary code similarity analysis (BCSA) serves as a foundational technique for binary
analysis tasks such as vulnerability detection and malware identification. Existing
graph based BCSA approaches capture more binary code semantics and demonstrate remarkable
performance. However, when code obfuscation is applied, the unstable control flow
structure degrades their performance. To address this issue, we develop ORCAS, an
Obfuscation-Resilient BCSA model based on Dominance Enhanced Semantic Graph (DESG).
The DESG is an original binary code representation, capturing more binaries' implicit
semantics without control flow structure, including inter-instruction relations (e.g.,
def-use), inter-basic block relations (i.e., dominance and post-dominance), and instruction-basic
block relations. ORCAS takes binary functions from different obfuscation options,
optimization levels, and instruction set architectures as input and scores their semantic
similarity more robustly. Extensive experiments have been conducted on ORCAS against
eight baseline approaches over the BinKit dataset. For example, ORCAS achieves an average 12.1% PR-AUC improvement when using
combined three obfuscation options compared to the state-of-the-art approaches. In
addition, an original obfuscated real-world vulnerability dataset has been constructed
and released to facilitate a more comprehensive research on obfuscated binary code
analysis. ORCAS outperforms the state-of-the-art approaches over this newly released
real-world vulnerability dataset by up to a recall improvement of 43%.
Empowering Large Language Model for Sequential Recommendation via Multimodal Embeddings
and Semantic IDs
- Yuhao Wang
- Junwei Pan
- Xinhang Li
- Maolin Wang
- Yuan Wang
- Yue Liu
- Dapeng Liu
- Jie Jiang
- Xiangyu Zhao
Sequential recommendation (SR) aims to capture users' dynamic interests and sequential
patterns based on their historical interactions. Recently, the powerful capabilities
of large language models (LLMs) have driven their adoption in SR. However, we identify
two critical challenges in existing LLM-based SR methods: 1) embedding collapse when
incorporating pre-trained collaborative embeddings and 2) catastrophic forgetting
of quantized embeddings when utilizing semantic IDs. These issues dampen the model
scalability and lead to suboptimal recommendation performance. Therefore, based on
LLMs like Llama3-8B-instruct, we introduce a novel SR framework named MME-SID, which
integrates multimodal embeddings and quantized embeddings to mitigate embedding collapse.
Additionally, we propose a Multimodal Residual Quantized Variational Autoencoder (MM-RQ-VAE)
with maximum mean discrepancy as the reconstruction loss and contrastive learning
for alignment, which effectively preserve intra-modal distance information and capture
inter-modal correlations, respectively. To further alleviate catastrophic forgetting,
we initialize the model with the trained multimodal code embeddings. Finally, we fine-tune
the LLM efficiently using LoRA in a multimodal frequency-aware fusion manner. Extensive
experiments on three public datasets validate the superior performance of MME-SID
thanks to its capability to mitigate embedding collapse and catastrophic forgetting.
The implementation code and datasets are publicly available for reproduction: https://github.com/Applied-Machine-Learning-Lab/MME-SID.
Beyond Surface Similarity: A Riemannian Hierarchical Ranking Framework for Sociological
Concept Equivalence
- Zeqiang Wang
- Wing Yan Li
- Jon Johnson
- Nishanth Sastry
- Suparna De
Vocabularies such as the European Language Social Science Thesaurus (ELSST) and the
CLOSER ontology are the foundational taxonomies capturing core social science concepts
that form the foundations of large-scale longitudinal social science surveys. However,
standard text embeddings often fail to capture the complex hierarchical and relational
structures of the sociological concepts, relying on surface similarity. In this work,
we propose a framework to model these nuances by adapting a large language model based
text embedding model with a learnable diagonal Riemannian metric. This metric allows
for a flexible geometry where dimensions can be scaled to reflect semantic importance.
Additionally, we introduce a Hierarchical Ranking Loss with dynamic margins as the
sole training objective to enforce the multi-level hierarchical constraints (e.g.,
distinguishing 'self' from narrower, broader, or related concepts, and all from 'unrelated'
ones) from ELSST within the Riemannian space, such as ensuring a specific concept
like 'social stratification' is correctly positioned by, for instance, being embedded
closer to 'social inequality' (as its broader, related concept) and substantially
further from an 'unrelated' concept like 'particle physics'. Lastly, we show that
our parameter-efficient approach significantly outperforms strong contrastive learning
and hyperbolic embedding baselines on hierarchical concept retrieval and classification
tasks using the ELSST and CLOSER datasets. Visualizations confirm the learned embedding
space exhibits a clear hierarchical structure. Our work offers a more accurate and
geometrically informed method for representing complex sociological constructs.
TimeRAG: Enhancing Complex Temporal Reasoning with Search Engine Augmentation
- Zhao Wang
- Ziliang Zhao
- Zhicheng Dou
While Large Language Models (LLMs) augmented with search engines have achieved remarkable
progress in open-domain question answering, their ability to adapt to a rapidly evolving
world remains limited. A critical challenge lies in the need for complex temporal
reasoning to answer real-world questions. Current Retrieval-Augmented Generation (RAG)
methods primarily focus on retrieving the latest information but often fail to perform
sophisticated temporal reasoning. To address this gap, we propose TimeRAG, a novel
RAG framework designed to dynamically handle complex temporal reasoning tasks. TimeRAG
operates through the iterative collaboration of two modules: (1) a temporal-semantic
Query Decomposition (QD) module, which breaks down the original question into atomic
time-event sub-questions to guide multi-step retrieval, and (2) a time-aware Answer
Generation (AG) module, which analyzes temporal contexts, generates intermediate answers
with confidence scores, and synthesizes the final answer upon reasoning completion.
The system is trained in three stages: (1) time-aware supervised fine-tuning of the
AG module, (2) imitation learning for the QD module to enhance temporal decomposition
ability, and (3) reinforcement learning for end-to-end joint optimization to enhance
temporal coherence across the entire system. Evaluations on three challenging benchmarks
show that TimeRAG significantly outperforms existing methods, particularly on questions
involving fast-changing real-world events and those grounded in false premises that
require detection and correction of outdated or incorrect assumptions.
Forecasting at Full Spectrum: Holistic Multi-Granular Traffic Modeling under High-Throughput
Inference Regimes
- Zhaoyan Wang
- Xiangchi Song
- In-Young Ko
Notably, current intelligent transportation systems rely heavily on accurate traffic
forecasting and swift inference provision to make timely decisions. While Graph Convolutional
Networks (GCNs) have shown benefits in modeling complex traffic dependencies, the
existing GCN-based approaches cannot fully extract and fuse multi-granular spatiotemporal
features across various spatial and temporal scales sufficiently in a complete manner,
proven to yield less accurate results. As extracting multi-granular features across
scales has been a promising strategy across domains such as computer vision, natural
language processing, and time-series forecasting, pioneering studies have attempted
to leverage a similar mechanism for spatiotemporal traffic data mining. However, additional
feature extraction branches introduced in prior studies critically increased model
complexity and extended inference time, making it challenging to provide fast forecasts.
In this paper, we propose MultiGran-STGCNFog, an efficient fog distributed inference
system with a novel traffic forecasting model that employs multi-granular spatiotemporal
feature fusion on generated dynamic traffic graphs to fully capture interdependent
traffic dynamics. The proposed scheduling algorithm GA-DPHDS, optimizing layer execution
order and layer-device scheduling scheme simultaneously, contributes to considerable
inference throughput improvement by coordinating heterogeneous fog devices in a pipelined
manner. Extensive experiments on real-world datasets demonstrate the superiority of
the proposed method over selected GCN baselines.
Bridging Thoughts and Words: Graph-Based Intent-Semantic Joint Learning for Fake News
Detection
- Zhengjia Wang
- Qiang Sheng
- Danding Wang
- Beizhe Hu
- Juan Cao
Fake news detection is an important and challenging task for defending online information
integrity. Existing state-of-the-art approaches typically extract news semantic clues,
such as writing patterns that include emotional words, stylistic features, etc. However,
detectors tuned solely to such semantic clues can easily fall into surface detection
patterns, which can shift rapidly in dynamic environments, leading to limited performance
in the evolving news landscape. To address this issue, this paper investigates a novel
perspective by incorporating news intent into fake news detection, bridging intents
and semantics together. The core insight is that by considering news intents, one
can deeply understand the inherent thoughts behind news deception, rather than the
surface patterns within words alone. To achieve this goal, we propose Graph-based
INtent-Semantic joInt moDEling (InSide) for fake news detection, which models deception
clues from both semantic and intent signals via graph-based joint learning. Specifically,
Inside reformulates news semantic and intent signals into heterogeneous graph structures,
enabling long-range context interaction through entity guidance and capturing both
holistic and implementation-level intent via coarse-to-fine intent modeling. To achieve
better alignment between semantics and intents, we further develop a dynamic pathway-based
graph alignment strategy for effective message passing and aggregation across these
signals by establishing a common space. Extensive experiments on four benchmark datasets
demonstrate the superiority of the proposed Inside compared to state-of-the-art methods.
Learning from Graph: Mitigating Label Noise on Graph through Topological Feature Reconstruction
- Zhonghao Wang
- Yuanchen Bei
- Sheng Zhou
- Zhiyao Zhou
- Jiapei Fan
- Hui Xue
- Haishuai Wang
- Jiajun Bu
Graph Neural Networks (GNNs) have shown remarkable performance in modeling graph data.
However, Labeling graph data typically relies on unreliable information, leading to
noisy node labels. Existing approaches for GNNs under Label Noise (GLN) employ supervision
signals beyond noisy labels for robust learning. While empirically effective, they
tend to over-reliance on supervision signals built upon external assumptions, leading
to restricted applicability. In this work, we shift the focus to exploring how to
extract useful information and learn from the graph itself, thus achieving robust
graph learning. From an information theory perspective, we theoretically and empirically
demonstrate that the graph itself contains reliable information for graph learning
under label noise. Based on these insights, we propose the Topological Feature Reconstruction
(TFR) method. Specifically, TFR leverages the fact that the pattern of clean labels
can more accurately reconstruct graph features through topology, while noisy labels
cannot. TFR is a simple and theoretically guaranteed model for robust graph learning
under label noise. We conduct extensive experiments across datasets with varying properties.
The results demonstrate the robustness and broad applicability of our proposed TFR
compared to state-of-the-art baselines. Codes are available at https://github.com/eaglelab-zju/TFR.
Exploration and Visualization of a Legal Knowledge Graph: A Human-Centered Approach
- Sabine Wehnert
- Pramod Kumar Bontha
- Kilian Lüders
- Huu Huong Giang Nguyen
- Ernesto William De Luca
Building applications for users in the legal domain is challenging due to their strict
requirements. In this work, we present a prototype combining search in legal norms,
legal cases, and legal textbooks with traversal options for the references between
these documents, modeled in a knowledge graph in the backend. We conducted a usability
test with law students as one of the main target user groups to evaluate the prototype.
The usability test comprised user surveys, retrieval test tasks, and a law exam. In
our study (n = 20), prototype users solved significantly more retrieval tasks than
controls (M = 12.7 vs. 7.5, p = .006) and rated usability at 62.3 SUS points, identifying
clear advantages in citation traversal and textbook reference tasks. While exam scores
showed no statistically significant difference, qualitative feedback confirmed improved
efficiency and satisfaction. We also compare user performance to Large Language Models
(LLMs) in vanilla and Retrieval-Augmented Generation configurations, motivated by
participant interest in AI-assisted features.
CAGCL: A Community-Aware Graph Contrastive Learning Model for Social Bot Detection
- Kaihang Wei
- Min Teng
- Haotong Du
- Songxin Wang
- Jinhe Zhao
- Chao Gao
Malicious social bot detection is vital for social network security. While graph neural
networks (GNNs) based methods have improved performance by modeling structural information,
they often overlook latent community structures, resulting in homogeneous node representations.
Leveraging community structures, which capture discriminative group-level patterns,
is therefore essential for more robust detection. In this paper, we propose a new
Community-Aware Graph Contrastive Learning (CAGCL) framework for enhanced social bot detection. Specifically, CAGCL first
exploits the latent community structures to uncover the potential group-level patterns.
Then, a dual-perspective community enhancement module is proposed, which strengthens
the structural awareness and reinforces topological consistency within communities,
thereby enabling more distinctive node representations and deeper intra-community
message passing. Finally, a community-aware contrastive learning module is proposed,
which considers nodes within the same community as positive pairs and those from different
communities as negative pairs, enhancing the discriminability of node representations.
Extensive experiments conducted on multiple benchmark datasets demonstrate that CAGCL
consistently outperforms state-of-the-art baselines. The code is available at https://github.com/cgao-comp/.
GCLS2: Towards Efficient Community Detection Using Graph Contrastive Learning with
Structure Semantics
- Qi Wen
- Yiyang Zhang
- Yutong Ye
- Yingbo Zhou
- Nan Zhang
- Xiang Lian
- Mingsong Chen
Due to the power of learning representations from unlabeled graphs, graph contrastive learning (GCL) has shown excellent performance in community detection tasks. Existing GCL-based
methods on the community detection usually focused on learning attribute representations
of individual nodes, which, however, ignores structure semantics of communities (e.g.,
nodes in the same community should be structurally cohesive). Therefore, in this paper,
we consider the community detection under the community structure semantics and propose
an effective framework for graph contrastive learning under structure semantics (GCLS2) to detect communities. To seamlessly integrate interior dense and exterior sparse
characteristics of communities with our contrastive learning strategy, we employ classic
community structures to extract high-level structural views and design a structure
semantic expression module to augment the original structural feature representation.
Moreover, we formulate the structure contrastive loss to optimize the feature representation
of nodes, which can better capture the topology of communities. To adapt to large-scale
networks, we design a high-level graph partitioning (HGP) algorithm that minimizes the community detection loss for GCLS2 online training. It is worth noting that we prove a lower bound on the training of
GCLS2 from the perspective of the information theory, explaining why GCLS2 can learn a more accurate representation of the structure. Extensive experiments
have been conducted on various real-world graph datasets and confirmed that GCLS2 outperforms nine state-of-the-art methods, in terms of the accuracy, modularity,
and efficiency of detecting communities.
DANet: A RAG-inspired Dual Attention Model for Few-shot Time Series Prediction
- Zimo Wen
- Hanwen Hu
- Nan Fang
- Shiyou Qian
- Jian Cao
Practical applications often require forecasting the future states of short time series
(STS) using multiple related long time series (LTS) as auxiliary data, a process known
as few-shot prediction. The primary challenge, given the limited data on STS, is effectively
capturing the pattern similarities between STS and LTS. Current methods, despite notable
advancements, primarily focus on trans- ferring pattern characteristics from LTS to
STS without explicitly addressing their similarities at various levels. To overcome
this lim- itation, we propose a novel few-shot time series forecasting model called
DANet. Drawing on the Retrieval-Augmented Generation (RAG) framework in large language
models, DANet retrieves long and short sequences from LTS that closely resemble STS,
thereby enhancing prediction accuracy while simultaneously reducing un- certainty
through this retrieval process. First, we define two metrics to quantify pattern similarities
between STS and LTS, addressing the issue of different representations of the same
pattern due to variations in sequence length. Second, we propose a dual-attention
mechanism which embeds the two similarities metrics to extract and integrate long
and short sequences from LTS across variable and temporal levels for generating predictions.
Our experiments across six scenarios show that DANet significantly outperforms six
state-of-the-art (SOTA) methods.
FedFMD: Fairness-Driven Adaptive Aggregation in Federated Learning via Mahalanobis
Distance
- Xiuting Weng
- Lixing Yu
- Shaojie Zhan
- Ruizhi Pu
- Xiaofei Liu
Federated learning (FL) facilitates collaborative global model training without compromising
data privacy. However, data distribution variations among clients inevitably introduce
bias in global updates, impacting model fairness and performance. Existing methods
assign client aggregation weights simply based on dataset size proportions or rely
on substantial assumptions about specific global data distributions such as uniform
label distributions. These approaches inadequately capture the intrinsic impact of
Non-IID data characteristics on model divergence. To address these deficiencies, we
propose a novel adaptive weight allocation algorithm, FedFMD, leveraging Mahalanobis
distance, integrating Task Arithmetic, to dynamically assign weights based on client
contributions. FedFMD explicitly models task-centric deviations caused by data heterogeneity
without requiring raw data access or prior distribution assumptions. Besides, FedFMD
enhances aggregation weights computation through time-decay adjustments, guided by
historical client performance trends, optimizing both fairness and utility. Extensive
evaluations against six state-of-the-art (SOTA) algorithms and two distance metrics
across three datasets demonstrate the superior performance of FedFMD in fairness and
utility.
CHEM: Causally and Hierarchically Explaining Molecules
- Gyeongdong Woo
- Soyoung Cho
- Donghyeon Kim
- Kimoon Na
- Changhyun Kim
- Jinhee Choi
- Jong-June Jeon
Graph Neural Networks (GNNs) have significantly advanced in analyzing graph-structured
data; however, their explainability remains challenging, affecting their applicability
in critical domains such as medicine and pharmacology. In particular, violating the
subgraph structure can degrade model interpretability and generalization performance.
To address this problem, we propose a hierarchical and explainable causal inference-based
GNN. Our model selects features based on explainable subgraph units informed by prior
knowledge. Our method begins by clustering molecules into functional groups via the
BRICS algorithm, then constructing a hierarchical structure at both the node and motif
levels. The proposed model employs a gate module that distills causal features on
the motif level and a loss function that disconnects information flow from non-causal
features to the target level. The classification results on real-world molecular graphs
demonstrate that our model outperforms other causal inference-based GNN models. In
addition, it is confirmed that leveraging molecular docking data effectively identifies
true causal substructures in the proposed model.
ST-Hyper: Learning High-Order Dependencies Across Multiple Spatial-Temporal Scales
for Multivariate Time Series Forecasting
- Binqing Wu
- Jianlong Huang
- Zongjiang Shang
- Ling Chen
In multivariate time series (MTS) forecasting, many deep learning based methods have
been proposed for modeling dependencies at multiple spatial (inter-variate) or temporal
(intra-variate) scales. However, existing methods may fail to model dependencies across
multiple spatial-temporal scales (ST-scales, i.e., scales that jointly consider spatial
and temporal scopes). In this work, we propose ST-Hyper to model the high-order dependencies
across multiple ST-scales through adaptive hypergraph modeling. Specifically, we introduce
a Spatial-Temporal Pyramid Modeling (STPM) module to extract features at multiple
ST-scales. Furthermore, we introduce an Adaptive Hypergraph Modeling (AHM) module
that learns a sparse hypergraph to capture robust high-order dependencies among features.
In addition, we interact with these features through tri-phase hypergraph propagation,
which can comprehensively capture multi-scale spatial-temporal dynamics. Experimental
results on six real-world MTS datasets demonstrate that ST-Hyper achieves the state-of-the-art
performance, outperforming the best baselines with an average MAE reduction of 3.8%
and 6.8% for long-term and short-term forecasting, respectively. Code is available
at https://anonymous.4open.science/ST-Hyper-83E7.
MillGNN: Learning Multi-Scale Lead-Lag Dependencies for Multi-Variate Time Series
Forecasting
- Binqing Wu
- Zongjiang Shang
- Jianlong Huang
- Ling Chen
Multi-variate time series (MTS) forecasting is crucial for various applications. Existing
methods have shown promising results owing to their strong ability to capture intra-
and inter-variate dependencies. However, these methods often overlook lead-lag dependencies
at multiple grouping scales, failing to capture hierarchical lead-lag effects in complex
systems. To this end, we propose MillGNN, a novel graph neural network-based method
that learns multiple grouping scale lead-lag dependencies for MTS forecasting, which
can comprehensively capture lead-lag effects considering variate-wise and group-wise
dynamics and decays. Specifically, MillGNN introduces two key innovations: (1) a scale-specific
lead-lag graph learning module that integrates cross-correlation coefficients and
dynamic decaying features derived from real-time inputs and time lags to learn lead-lag
dependencies for each scale, which can model evolving lead-lag dependencies with statistical
interpretability and data-driven flexibility; (2) a hierarchical lead-lag message
passing module that passes lead-lag messages at multiple grouping scales in a structured
way to simultaneously propagate intra- and inter-scale lead-lag effects, which can
capture multi-scale lead-lag effects with a balance of comprehensiveness and efficiency.
Experimental results on 11 datasets demonstrate the superiority of MillGNN for long-term
and short-term MTS forecasting, compared with 16 state-of-the-art methods.
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate
Reasoning
- Chenghao Wu
- Ruiyang Ren
- Junjie Zhang
- Ruirui Wang
- Zhongrui Ma
- Qi Ye
- Wayne Xin Zhao
While modern recommender systems are instrumental in navigating information abundance,
they remain fundamentally limited by static user modeling and reactive decision-making
paradigms. Current large language model (LLM)-based agents inherit these shortcomings
through their overreliance on heuristic pattern matching, yielding recommendations
prone to shallow correlation bias, limited causal inference, and brittleness in sparse-data
scenarios. We introduce STARec, a slow-thinking augmented agent framework that endows
recommender systems with autonomous deliberative reasoning capabilities. Each user
is modeled as an agent with parallel cognitions: fast response for immediate interactions
and slow reasoning that performs chain-of-thought rationales. To cultivate intrinsic
slow thinking, we develop anchored reinforcement training-a two-stage paradigm combining
structured knowledge distillation from advanced reasoning models with preference-aligned
reward shaping. This hybrid approach scaffolds agents in acquiring foundational capabilities
(preference summarization, rationale generation) while enabling dynamic policy adaptation
through simulated feedback loops. Experiments on MovieLens 1M and Amazon CDs benchmarks
demonstrate that STARec achieves substantial performance gains compared with state-of-the-art
baselines, despite using only 0.4% of the full training data.
Spatio-Temporal Forecasting under Open-World Missingness with Adaptive Mixture-of-Experts
- Chenyu Wu
- Zhipeng Ma
- Junbo Zhang
- Songyu Ke
- Yu Zheng
Spatio-temporal forecasting is crucial for sustainable urban development and societal
decision-making. However, real-world spatio-temporal data often exhibit open-world missingness: missing rates and patterns evolve dynamically across time and space, severely disrupting
dependencies and challenging accurate forecasting. Traditional methods universally
overlook the dynamic nature of missingness, resulting in degraded predictive accuracy.
To address this gap, we propose a novel Spatio-Temporal Missing-aware Mixture-of-Experts (STMMoE) architecture, equipped with a three-stage training strategy. STMMoE dynamically adapts
to varying missing rates through a gating mechanism that selects specialized expert
branches. The three-stage training strategy improves end-to-end forecasting performance
by aligning the representations of complete and missing data. Extensive experiments
on two real-world datasets show that our method achieves state-of-the-art performance.
Gravity-GNN: Deep Reinforcement Learning Guided Space Gravity-based Graph Neural Network
- Huaming Wu
- Lei Tian
- Chaogang Tang
- Pengfei Jiao
- Minxian Xu
- Huijun Tang
Graph Neural Networks (GNNs) have demonstrated remarkable capabilities in handling
graph data. Typically, GNNs recursively aggregate node information, including node
features and local topological information, through a message-passing scheme. However,
most existing GNNs are highly sensitive to neighborhood aggregation, and irrelevant
information in the graph topology can lead to inefficient or even invalid node embeddings.
To overcome these challenges, we propose a novel Space Gravity-based Graph Neural
Network (Gravity-GNN) guided by Deep Reinforcement Learning (DRL). In particular,
we introduce a novel similarity measure called ''node gravity'', inspired by the gravitational
force between particles in space, to compare nodes within graph data. Furthermore,
we employ DRL technology to learn and select the most suitable number of adjacent
nodes for each node. Our experimental results on various real-world datasets demonstrate
that Gravity-GNN outperforms state-of-the-art methods regarding node classification
accuracy, while exhibiting greater robustness against disturbances.
TLCCSP: A Scalable Framework for Enhancing Time Series Forecasting with Time-Lagged
Cross-Correlations
- Jianfei Wu
- Wenmian Yang
- Bingning Liu
- Weijia Jia
Time series forecasting is critical across various domains, such as weather, finance
and real estate forecasting, as accurate forecasts support informed decision-making
and risk mitigation. While recent deep learning models have improved predictive capabilities,
they often overlook time-lagged cross-correlations between related sequences, which
are crucial for capturing complex temporal relationships. To address this, we propose
the Time-Lagged Cross-Correlations-based Sequence Prediction framework (TLCCSP), which
enhances forecasting accuracy by effectively integrating time-lagged cross-correlated
sequences. TLCCSP employs the Sequence Shifted Dynamic Time Warping (SSDTW) algorithm
to capture lagged correlations and a contrastive learning-based encoder to efficiently
approximate SSDTW distances. Experimental results on weather, finance and real estate
time series datasets demonstrate the effectiveness of our framework. On the weather
dataset, SSDTW reduces mean squared error (MSE) by 16.01% compared with single-sequence
methods, while the contrastive learning encoder (CLE) further decreases MSE by 17.88%.
On the stock dataset, SSDTW achieves a 9.95% MSE reduction, and CLE reduces it by
6.13%. For the real estate dataset, SSDTW and CLE reduce MSE by 21.29% and 8.62%,
respectively. Additionally, the contrastive learning approach decreases SSDTW computational
time by approximately 99%, ensuring scalability and real-time applicability across
multiple time series forecasting tasks.
Beyond Return Conditioning: Multi-Scale Sequence Modeling and Advantage-Guided Policy
Routing for Offline RL
- Kunbao Wu
- Xinning Zhu
- Yang Qin
- Tieru Wang
- Jianzhou Diao
- Zheng Hu
Return-conditioned supervised learning (RCSL) in offline reinforcement learning (RL)
leverages Transformers to extract behavioral patterns from offline datasets for decision-making.
However, it suffers from inherent limitations in comprehensively capturing multi-scale
temporal relationships in historical trajectories. Moreover, its return-conditioning
mechanism offers limited guidance in exploiting high-quality behavioral patterns,
often resulting in suboptimal action generation during inference. To address these
challenges, we propose the Advantage Decision ConvMamba (ADCM), a method that integrates
multi-scale sequence modeling (MSSM) with advantage policy guidance (APG). ADCM reconstructs
historical sequences through patch partitioning and employs Mamba architecture together
with causal convolutions to model sparse global dependencies and dense local Markovian
dependencies for behavioral pattern discovery. By incorporating relative advantage
action sampling based on the Mixture-of-Experts (MoE) framework, ADCM prioritizes
high-quality actions during inference, thereby reducing reliance on low-quality behavioral
patterns in the dataset. We evaluate ADCM on multiple offline RL benchmarks from D4RL.
Experimental results show that ADCM achieves significant improvements over baseline
models, with particularly strong performance on suboptimal datasets. The code for
ADCM is available at https://github.com/iTom233/ADCM.git.
Multi-Resource-Aware Admission Control for Online Data Processing
- Ruoyu Wu
- Wei Bao
- Hequn Wang
Online data processing platforms offer free-tier services to attract users, where
the service provider must strategically utilize limited resources to handle large
amounts of low-value requests. Admission control has been leveraged to choose requests
to process or skip, yet request uncertainties like unknown rewards and request numbers,
and complicated interdependence among multi-resource consumption further pose challenges.
To tackle these challenges, we propose a novel admission control solution, the Online
Multi-resource Magician's Admission (OMMA) algorithm, that balances resource consumption
with reward accumulation across online requests, while coordinating the intertwined
consumption of different resources. OMMA is an online algorithm, with its performance
evaluated by competitive ratio. OMMA attains a competitive ratio of C(1-K-1/2)M(1-L-1/2)N, where C is the constant capturing the intrinsic limitation of resource availability, K is the individual resource budget, M is the number of individual resources, L is the joint resource budget, and N is the number of joint resources. The competitive ratio achieved by OMMA is tight,
meaning that it can be achieved by OMMA in certain problem instances. We implement
trace-driven experiments to evaluate the practical performance of OMMA on a real-world
LLM prompt dataset, demonstrating the superior performance of OMMA in online data
processing services.
Asking Questions with Thoughts: An Efficient Difficulty-Controllable Question Generation
Method with Posterior Knowledge Distillation
- Sixing Wu
- Jiahao Chen
- Yujue Zhou
- Zhijun Yang
- Wei Zhou
Difficulty Controllable Question Generation (DCQG) for reading comprehension learns
to generate questions for measuring the reading abilities of examinees, playing a
crucial role in educational scenarios. This work studies answer-aware DCQG, a challenging
task that requires the generated questions to remain faithful to the assigned answer
and match the desired difficulty level at the same time. To this end, we first propose
an effective two-stage framework, Asking Questions with Thoughts (AQT), to guide a backbone large language model (LLM) to generate questions that are both
faithful and difficulty-aware through conducting in-depth self-thinking. Then, we
introduce a novel Posterior Knowledge Distillation (PKD) to efficiently fine-tune AQT by distilling knowledge from posterior inference. Finally,
to address the scarcity of DCQG datasets, we use an efficient LLM Pretest-based Difficulty Estimation (LP-DE) to automatically construct DCQG datasets from common QG/QA datasets. Extensive experiments
prove that our methods have promising results in terms of both faithfulness and difficulty
awareness.
Empowering Denoising Sequential Recommendation with Large Language Model Embeddings
- Tongzhou Wu
- Yuhao Wang
- Maolin Wang
- Chi Zhang
- Xiangyu Zhao
Sequential recommendation aims to capture user preferences by modeling sequential
patterns in user-item interactions. However, these models are often influenced by
noise such as accidental interactions, leading to suboptimal performance. Therefore,
to reduce the effect of noise, some works propose explicitly identifying and removing
noisy items. However, we find that simply relying on collaborative information may
result in an over-denoising problem, especially for cold items. To overcome these
limitations, we propose a novel framework: Interest Alignment for Denoising Sequential
Recommendation (IADSR) which integrates both collaborative and semantic information.
Specifically, IADSR is comprised of two stages: in the first stage, we obtain the
collaborative and semantic embeddings of each item from a traditional sequential recommendation
model and an LLM, respectively. In the second stage, we align the collaborative and
semantic embeddings and then identify noise in the interaction sequence based on long-term
and short-term interests captured in the collaborative and semantic modalities. Our
extensive experiments on four public datasets validate the effectiveness of the proposed
framework and its compatibility with different sequential recommendation systems.
The code and data are released for reproducibility: https://github.com/Applied-Machine-Learning-Lab/IADSR.
Learning Conditional Probability Distributions for Robust Probabilistic Inference
in Bayesian Network
- Xinran Wu
- Kun Yue
- Huashuai Liu
- Liang Duan
Bayesian Network (BN) has been widely employed for many applications like medical
diagnosis due to its ability to deal with probabilistic inferences. Real-world inference
tasks in BN cannot be robustly processed by classic search-based inference algorithms,
since the conditional probabilities w.r.t. given arbitrary evidence values may be
missing (i.e., not included) in the conditional probability tables (CPTs). Most of
the existing methods, relying on imputation models, density estimation models or deep
neural networks, cannot accurately learn these missing probabilities. To this end,
we incorporate the idea of learning and search for robust probabilistic inferences
in BN. Firstly, we decompose the probabilistic inference task into missing and existing
probability factors, ensuring the consistency of their probability spaces. Secondly,
we define the Wasserstein distance between missing and existing probability factors,
and incorporate the idea of generative adversarial network to obtain missing probability
factors with the minimal Wasserstein distance. Finally, we give the algorithm for
robust probabilistic inferences with arbitrary evidence values, which could also be
used to deal with the probabilistic inferences with arbitrary query values. Extensive
experiments on synthetic and real-world datasets are conducted to demonstrate the
superiority of our proposed method.
Adaptive Context-Infused Performance Evaluator for Iterative Feature Space Optimization
- Yanping Wu
- Yanyong Huang
- Zijun Yao
- Yanjie Fu
- Kunpeng Liu
- Xiao Luo
- Dongjie Wang
Iterative feature space optimization includes continuously evaluating and refining
the feature space to improve downstream task performance. However, existing methods
commonly suffer from three major limitations: 1) ignoring differences between samples
leads to evaluation bias; 2) the feature space is overly tailored to specific models,
resulting in overfitting and poor generalization; and 3) retraining the evaluator
from scratch in each iteration significantly reduces overall efficiency. To bridge
these gaps, we introduce EASE (gEneralized Adaptive feature Space Evaluator), a generalized
framework for efficient and objective evaluation of iteratively generated feature
spaces. This framework includes two key components: Feature-Sample Subspace Generator
and Contextual Attention Evaluator. The first component aims to mitigate evaluation
bias by decoupling the information distribution within the feature space. To achieve
this, based on feedback from the subsequent evaluator, we identify the samples most
challenging for evaluation and the features most relevant to prediction tasks. The
second component intends to incrementally capture evolving patterns of the feature
space for efficient evaluation. Specifically, we propose a weighted-sharing multi-head
attention mechanism to encode the feature space into an embedding vector for evaluation,
and update the evaluator incrementally to retain prior knowledge while incorporating
new information. Extensive experiments on fifteen public datasets demonstrate the
effectiveness of EASE. We have released our code and data to the public.
Robust Multi-Label Learning with Instance-Dependent Label Noise
- You Wu
- Yabo Shi
- Yizhang Zou
- Peipei Li
Multi-label learning focuses on tasks where each instance is associated with multiple
labels. Due to the high cost of obtaining accurate annotations, real-world multi-label
datasets often contain noisy labels from crowdsourcing or automated annotation. Noisy
multi-label learning has hence been studied to address the label noise problem. However,
existing noisy multi-label methods struggle to handle instance-dependent noise (IDN),
which is a complex and common type of label noise in practical applications, severely
affecting the reliability of multi-label learning. In contrast, existing methods designed
for single-label IDN cannot be directly applied to multi-label data. To address these
challenges, we propose Robust multi-label learning with Instance-Dependent label noisE
(RIDE), a framework for multi-label learning with IDN. Specifically, RIDE first decomposes
the observed label matrix into clean and noisy components via a joint low-rank and
sparse decomposition. Secondly, a linear sparse mapping from feature space to label
space is introduced to explicitly model how instance features induce IDN. Thirdly,
to further improve the denoising accuracy, RIDE estimates a noise suppression coefficient
for each sample, which weights the sparse regularization term of the noise decomposition.
In addition, theoretical analysis is provided to derive upper bounds on the noise
estimation and generalization errors. Extensive experiments on benchmark multi-label
datasets with varying noise rates show that RIDE outperforms state-of-the-art methods.
The code is available at https://github.com/View5U/RIDE.
FedGC: Contrastive-enhanced Subgraph Federated Learning with Grouping Pseudo-Label
- Keao Xi
- Nannan Wu
- Yiming Zhao
- Wenjun Wang
Graph structures are widely used to model relational data. In many real-world applications,
each data client usually holds only a partial subgraph of the original graph, and
privacy concerns limit data exchange between clients. This decentralized data distribution
often degrades the effectiveness of conventional Graph Neural Networks (GNNs). Recently,
subgraph federated learning has been proposed to enable subgraph data collaborative
training without compromising privacy. However, two critical challenges remain: (1)
Missing links between subgraphs from different clients significantly prevent the message-passing
process in GNNs. (2) Varying data distributions across subgraphs lead to the non-IID
issue (e.g., node label skews), which require personalization of local clients in
subgraph federated learning scenarios. To address these challenges, we propose FedGC,
a novel subgraph federated method that combines pseudo label-based client grouping
with Local-Global Contrastive Tasks. Specifically, FedGC initializes a random graph
on the server, leverages predicted pseudo label distributions to group clients, and
assigns aggregation weights based on the similarity of these distributions. Additionally,
FedGC incorporates Local-Global Contrastive tasks into the local client learning process
to achieve personalized client parameter update. By adjusting the balance between
local supervision task and contrastive task, FedGC enables each client to effectively
control the balance of local and global information. Extensive experiments on six
real-world datasets that cover citation networks and social networks validate the
superior performance of FedGC compared to state-of-the-art baselines.
GraphIAM: Two-Stage Algorithm for Improving Class-Imbalanced Node Classification on
Attribute-Missing Graphs
- Riting Xia
- Chunxu Zhang
- Xueyan Liu
- Anchen Li
- Yan Zhang
Addressing class-imbalanced graphs is a challenging task due to the involvement of
both node attributes and graph structures. Existing works on class-imbalanced graphs
simply assume that all node attributes are available. However, in real-world graphs,
many nodes may lack attributes due to privacy issues or missing data, making class-imbalanced
graph learning more challenging. In this paper, we propose GraphIAM, a novel two-stage
algorithm for improving class-imbalanced node classification on attribute-missing
graphs. In the pre-training phase, GraphIAM adopts graph contrastive learning with
oversampling to tackle both attribute-missing and class-imbalanced issues. During
fine-tuning, an adapter mechanism is introduced to learn node representations, alleviating
the generalization gap between pre-training and downstream tasks. Experimental results
on benchmark datasets demonstrate that our method achieves state-of-the-art performance,
outperforming class-imbalanced graph learning approaches by 5% in F Score on graphs
with severe attribute missingness.
Dual-Space Masked Reconstruction for Robust Self-Supervised Human Activity Recognition
- Shuo Xiao
- Jiukai Deng
- Chaogang Tang
- Zhenzhen Huang
Human Activity Recognition (HAR) based on wearable sensors faces critical challenges,
including limited labeled data, distribution shifts, and sensitivity to sensor noise.
To address these issues, this paper proposes a novel self-supervised learning (SSL)
framework that leverages dual-space masked reconstruction to learn robust and generalizable
representations for sensor-based HAR. Specifically, we design a brand-new pretext
task in the pre-training stage, which learns representations by reconstructing the
original signals with partial masks in both the original space and the latent space.
For the reconstruction task in the original space, we adopt an approach similar to
Masked Autoencoders (MAE). In the latent space, we introduce a Spectral block to extract
more discriminative representations from the time-domain and frequency-domain information.
Then, we implement feature-level reconstruction using a Mean Teacher Network. The
feature extractor trained through this pretext task is subsequently utilized for downstream
challenging HAR tasks. Experiments on four public datasets (MotionSense, UCI-HAR,
PAMAP2 and RealWorld) demonstrate the framework's superiority over state-of-the-art
SSL methods, achieving an average 2.45% improvement in Macro F1-score under fine-tuning
protocols. This work achieves excellent activity recognition performance in labeled
data scarcity and Non- IID data scenarios that are closer to the display world by
unifying local and global signal interaction. It provides a scalable solution for
anti noise activity recognition in heterogeneous environments, thereby advancing HAR.
Frequency-Domain Disentanglement-Fusion and Dual Contrastive Learning for Sequential
Recommendation
- Shuo Xiao
- Jingtao Zhang
- Chaogang Tang
- Zhenzhen Huang
Sequential recommendation(SR) aims to provide personalized recommendations by capturing
behavioral intents from existing user interaction sequences. Most previous studies
are based on attention mechanisms; however, these approaches suffer from inherent
over-smoothing issues that limit their ability to capture transient behavioral signals
reflecting the user's immediate intents in interaction sequences. Recently, frequency-domain
analysis methods based on the Fourier transform have garnered significant attention
in the sequential recommendation domain. By applying the Fourier transform, interaction
sequences can be mapped to the frequency domain, enabling direct analysis and targeted
manipulation of distinct frequency components. In addition to the inherent limitations
of self-attention mechanisms, sequential recommendation faces persistent challenges
such as data sparsity and noise. To address these issues, we propose Frequency-Domain
Disentanglement-Fusion and Dual Contrastive Learning for Sequential Recommendation
(FDCLRec). FDCLRec replaces self-attention mechanisms with a frequency-domain adaptive
filtering module, which decouples sequence patterns into distinct high-/low-frequency
components and synthesizes comprehensive sequence representations through adaptively
weighted fusion. In addition, two auxiliary contrastive learning tasks(augmented-view
contrasting and same-target sequence contrastive learning) are strategically integrated
to alleviate data sparsity and interaction noise. Extensive experiments on four real-world
datasets demonstrate that our model outperforms baseline methods.
Efficient Knowledge Graph Unlearning with Zeroth-order Information
- Yang Xiao
- Ruimeng Ye
- Bohan Liu
- Xiaolong Ma
- Bo Hui
Due to regulations like the Right to be Forgotten, there is growing demand for removing
training data and its influence from models. Since full retraining is costly, various
machine unlearning methods have been proposed. In this paper, we firstly present an
efficient knowledge graph (KG) unlearning algorithm. We remark that KG unlearning
is nontrivial due to the distinctive structure of KG and the semantic relations between
entities. Also, unlearning by estimating the influence of removed components incurs
significant computational overhead when applied to large-scale knowledge graphs. To
this end, we define an influence function for KG unlearning and propose to approximate
the model's sensitivity without expensive computation of first-order and second-order
derivatives for parameter updates. Specifically, we use Taylor expansion to estimate
the parameter changes caused by data removal. Given that the first-order gradients
and second-order derivatives dominate the computational load, we use the Fisher matrices
and zeroth-order optimization to approximate the inverse-Hessian vector product without
constructing the computational graphs. Our experimental results demonstrate that the
proposed method outperforms other state-of-the-art graph unlearning baselines significantly
in terms of unlearning efficiency and unlearning quality. Our code is released at
https://github.com/NKUShaw/ZOWFKGIF.
A Robust and High-Efficiency Active Clustering Framework with Multi-User Collaboration
- Wen-Bo Xie
- Tian Zou
- Tao Deng
- Xuan-Lin Zhu
- Xun Fu
- Qiu-Yu Wang
- Bin Chen
- Xin Wang
Active constraint-based clustering enhances semi-supervised clustering through a machine-led
interaction process. This approach dynamically selects the most informative constraints
to query, minimizing the number of human annotations required. Existing methods face
three key challenges in real-world applications: scalability, timeliness, and robustness
against user annotation errors. In this work, we propose a robust and high-efficiency
Active Clustering framework with Multi-user Collaboration (ACMC). ACMC constructs
a diffusion tree using the nearest-neighbor technique and employs a multi-user online
collaboration framework to iteratively refine clustering results. In each iteration:
(a) nodes with high uncertainty and representativeness are selected in batch; (b)
well-designed multi-user asynchronous query categorizes selected nodes using neighborhood
sets, reducing individual workloads and improving overall timeliness; (c) user-provided
constraints and newly discovered categories are synchronized, with user confidences
dynamically updated to enhance robustness against erroneous annotations; (d) categorized
nodes, stored in neighborhood sets, serve as sources in the diffusion tree to refine
the clusters. Experimental results demonstrate that ACMC outperforms baseline methods
in terms of clustering quality, scalability, and robustness against user annotation
errors.
Federated Approximate Query Processing Based on Deep Models
- Yutong Xie
- Qingzhi Ma
- Lei Zhao
- An Liu
Data isolation poses a significant challenge to efficient big data query processing,
as data providers are often reluctant to share their raw data due to security concerns.
Current federated query systems address this issue by employing Secure Multi-Party
Computation (SMC) and Differential Privacy (DP) to facilitate secure and collaborative
computation. However, these privacy-preserving methods rely on cryptographic protocols,
which introduce substantial computational overhead, slowing query processing by up
to 1,000 times compared to plaintext queries. While sampling methods have been explored
to enhance federated query systems, they frequently fail to strike a balance between
accuracy and speed. To address the limitations above, we propose a secure federated approximate query system based on a deep classifier (SAQDC). This system utilizes deep learning techniques to accelerate query processing
while integrating SMC and Differential Privacy to achieve an optimal balance between
privacy and efficiency by allowing each data provider to train classifiers using Multi-Layer
Perceptron (MLP) and Deep Set architectures, which predict query relative errors across
different modules.Based on the prediction errors generated by the classifier, queries
are assigned to the most appropriate approximate query model and the differential
privacy parameters are adjusted to enhance query accuracy. This approach enhances
query speed, preserves accuracy, and effectively mitigates malicious differential
privacy attacks. We demonstrate SAQDC's superior performance through extensive experiments
on three large-scale datasets.
Lead-LagNet: Exploiting Lead-Lag Dependencies for Cross-Series Temporal Prediction
- Zhilong Xie
- Shaofei Shen
- Jiwen Huang
- Rui Cheng
- Qing Li
In many real-world systems, the evolution of one time series often leads or lags that
of its related peers rather than moving in perfect synchrony. Graph Neural Networks
(GNNs) are widely used to model such inter-Series Lead-Lag relationships, representing
entities as graph nodes with time-series attributes. However, existing methods typically
collapse temporal information into discrete points and adopt uniform messaging mechanisms
assuming synchronized upward/downward effects and identical time lags among related
peers, which are often inconsistent with real-world dynamics. Furthermore, stacking
GNN layers to capture multi-hop influences reduces interpretability, hindering understanding
of underlying dynamics. To address these issues, we propose the Lead-LagNet, a framework
designed to capture diverse cross-series propagation patterns with lead-lag phenomenon
in time series. The Lead-LagNet identifies meaningful subsequences in time series
and employs a gating mechanism to establish lead-lag connections, enabling the model
to uncover complex influencing patterns without relying on predefined relationships.
By decoupling the linear messaging process from non-linear feature extraction, the
proposed Lead-LagNet enhances both modeling flexibility and interoperability. Experimental
evaluation of both synthetic tasks and real-world datasets demonstrates the superiority
of Lead-LagNet over state-of-the-art algorithms, including BiGRU, SFM, TGC, FinGAT
and ADGAT. Our code and data are available at https://github.com/FICLAB/LeadLagNet.
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
- Amy Xin
- Yunjia Qi
- Zijun Yao
- Fangwei Zhu
- Kaisheng Zeng
- Bin Xu
- Lei Hou
- Juanzi Li
Specialized entity linking (EL) models are well-trained at mapping mentions to unique
knowledge base (KB) entities according to a given context. However, specialized EL
models struggle to disambiguate long-tail entities due to their limited training data.
Meanwhile, extensively pre-trained large language models (LLMs) possess broader knowledge
of uncommon entities. Yet, with a lack of specialized EL training, LLMs frequently
fail to generate accurate KB entity names, limiting their standalone effectiveness
in EL. With the observation that LLMs are more adept at context generation instead
of EL execution, we introduce LLM-Augmented Entity Linking (LLMAEL), the first framework
to enhance specialized EL models with LLM data augmentation. LLMAEL leverages off-the-shelf,
tuning-free LLMs as context augmenters, generating entity descriptions to serve as
additional input for specialized EL models. Experiments show that LLMAEL sets new
state-of-the-art results across 6 widely adopted EL benchmarks: compared to prior
methods that integrate tuning-free LLMs into EL, LLMAEL achieves an absolute 8.9%
gain in EL accuracy. We release our code and datasets.
Multi-Turn Interactions for Text-to-SQL with Large Language Models
- Guanming Xiong
- Junwei Bao
- Hongfei Jiang
- Yang Song
- Wen Zhao
This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities
of large language models (LLMs). Despite recent advancements, existing LLM-based methods
are still inefficient and struggle to handle cases with wide tables effectively. Furthermore,
current interaction-based approaches either lack a step-by-step, interpretable SQL
generation process or fail to provide a universally applicable interaction design.
To address these challenges, we introduce Interactive-T2S, a framework that generates
SQL queries through direct interactions with databases. This framework includes four
general tools that facilitate proactive and efficient information retrieval by the
LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise
reasoning processes within our framework. Our approach achieves advanced performance
on the Spider and BIRD datasets as well as their variants. Notably, we obtain state-of-the-art
results on the BIRD leaderboard under the setting without oracle knowledge, demonstrating
the effectiveness of our method. Code and data are available at: https://github.com/JimXiongGM/Interactive-Text-to-SQL.
Interactive Text-to-Visualization: Refining Visualization Outputs Through Natural
Language User Feedback
- Xubang Xiong
- Raymond Chi-Wing Wong
- Yuanfeng Song
Data visualization (DV) is of significance to the data analysis applications, exploring
the hidden patterns and showing the insightful data. The task of Text-to-Vis, which
takes text as input and generates data visualizations, was proposed to lower the threshold
for generating DVs. However, the existing methods only view this problem as a one-shot
mapping problem, directly outputting the final DVs without considering any user feedback
for refining the generated DVs. Motivated by this, a more interactive scenario is
investigated, where users could provide natural language feedback to refine the generated
DVs. The scenario is formulated as the Text-to-Vis with Feedback problem. A new dataset
is also created to further study the problem, which contains the user utterance, database
schema, generated DVs, the natural language feedback and the refined DVs. A large
language model (LLM) based framework named Vis-Edit is designed for handling this
task, including schema linking, clause location, clause generation, merger and self-consistency.
Eventually, extensive experiments reveal the effectiveness of Vis-Edit.
ESED: Emotion-Specific Evidence Decomposition for Uncertainty-Aware Multimodal Emotion
Recognition in Conversation
- Zechang Xiong
- Zhenyan Ji
- Wenkang Kong
- Jiuqian Dai
- Shen Yin
Multimodal emotion recognition in conversations is inherently challenging due to ambiguous
cues, modality conflicts, and temporal dynamics, all of which contribute to complex
and diverse uncertainty sources. While some recent methods incorporate uncertainty
modeling, they often focus on overall prediction confidence, without explicitly distinguishing
the different sources of uncertainty introduced by underlying factors. To address
these challenges, we propose a novel Emotion-Specific Evidence Decomposition framework
(ESED) that leverages evidential deep learning to explicitly model and disentangle
multimodal emotional uncertainty. Rather than directly fusing features, ESED decomposes
each modality's evidence into three interpretable components: (1) emotion-consistent
evidence, capturing shared emotional cues across modalities; (2) emotion-specific
evidence, highlighting the unique emotional role of each modality; and (3) dynamic
evidence, modeling utterance-level temporal variations. These components are adaptively
weighted based on emotional intensity, ambiguity, and dynamicity, quantified via prediction
entropy, inter-modal divergence, and temporal variance. The final prediction is obtained
through an adaptive fusion of these weighted components. Extensive experiments demonstrate
that ESED outperforms the state-of-the-art methods on the MELD and IEMOCAP datasets,
demonstrating the effectiveness of our proposed method.
Reinforcement Learning-Driven Generative Retrieval with Semantic-aligned Multi-Layer
Identifiers
- Bo Xu
- Yicen Tian
- Xiaokun Zhang
- Erchen Yu
- Dailin Li
- Linlin Zong
- Hongfei Lin
Generative retrieval enhances retrieval effectiveness by generating natural language
represented document identifiers. However, current methods often struggle with two
major challenges: limited identifier quality and insufficient query-document interaction,
leading to limited retrieval performance. To tackle these challenges, we propose a
novel generative retrieval framework integrated with semantic-aligned multi-layer
identifiers and reinforcement learning. To improve identifier quality, we design a
prompt-driven multi-task learning strategy to generate three types of hierarchical
identifiers: summary, keyword, and pseudo-query, to capture multi-granularity document
semantics. Furthermore, we adopt supervised fine-tuning to integrate these identifiers.
To improve query-document interaction, we devise a multi-view ranking fusion mechanism
that combines retrieval results across multi-layer identifiers. We further employ
a GRPO-based reinforcement learning based on dense similarity rewards and a difficulty-aware
negative sampling strategy to optimize the generated identifiers. Experiments on multiple
benchmark datasets show that our framework significantly outperforms existing generative
retrieval methods, offering a promising solution for building more effective and semantically
aligned retrieval systems. The code for our model is publicly available at https://github.com/yicentian02/GRAM-RL.
The Structure of Cross-National Collaboration in Open-Source Software Development
- Henry Xu
- Katy Yu
- Hao He
- Hongbo Fang
- Bogdan Vasilescu
- Patrick S. Park
Open-source software (OSS) development platforms, such as GitHub, expand the potential
for cross-national collaboration among developers by lowering the geographic, temporal,
and coordination barriers that limited software innovation in the past. However, research
has shown that the technological affordances that facilitate cross-national collaboration
do not uniformly benefit all countries. Using the GitHub Innovation Graph dataset,
which aggregates the complete cross-country collaborations among the entire population
of GitHub developers, we present quantitative evidence of deep-seated religious and
cultural affinities, shared colonial histories, and geopolitical factors structuring
the collaborations between non-U.S. country pairs that become visible when the overarching
dominance of the U.S. is removed from the data. This study highlights the opportunities
to develop decentralizing strategies to facilitate new collaborations between developers
in non-U.S. countries, thereby fostering the development of novel, innovative solutions.
More generally, this study also underscores the importance of contextualizing user
behavior and knowledge management in information systems with long-term, macro-social
conditions in which these systems are inextricably embedded.
CLAP: Coreference-Linked Augmentation for Passage Retrieval
- Huanwei Xu
- Lin Xu
- Liang Yuan
Large Language Model (LLM)-based passage expansion has shown promise for enhancing
first-stage retrieval, but often underperforms with dense retrievers due to semantic
drift and misalignment with their pretrained semantic space. Beyond this, only a portion
of a passage is typically relevant to a query, while the rest introduces noise-an
issue compounded by chunking techniques that break coreferential continuity. We propose
Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent
chunks, resolves coreference chains, and generates localized pseudo-queries aligned
with dense retriever representations. A simple fusion of global topical signals and
fine-grained subtopic signals achieves robust performance across domains. CLAP yields
consistent gains even as retriever strength increases, enabling first-stage retrieval
to match or surpass second-stage rerankers such as BM25 + MonoT5-3B-exceeding the
reranker by up to +20.68 nDCG@10 on ArguAna. These improvements are especially notable
in out-of-domain settings, where conventional LLM-based expansion methods relying
on domain knowledge often falter. CLAP instead adopts a logic-centric pipeline that
enables robust, domain-agnostic generalization.
LiveVal: Real-time and Trajectory-based Data Valuation via Adaptive Reference Points
- Jie Xu
- Zihan Wu
- Cong Wang
- Xiaohua Jia
Data valuation quantifies the contribution of each training data, enabling harmful
data detection and enhancing model robustness. However, existing methods are typically
post-hoc and require fully trained models, making them computationally expensive and
unable to detect harmful data early in training. We propose LiveVal, a real-time and
trajectory-based data valuation method that assesses training data by analyzing their
influence on the optimization trajectory. LiveVal includes three key innovations:
1) a real-time valuation framework with minimal overhead, seamlessly integrated into
standard training processes; 2) an adaptive reference point mechanism that assesses
data impact on generalization; and 3) a normalization technique that ensures fair
comparisons across training stages. Theoretical analysis shows that LiveVal achieves
directional alignment, boundedness, stability, and fairness. Experiments demonstrate
that LiveVal achieves up to 180× speedup over baseline methods while maintaining robust
performance across diverse models and datasets.
KV-Auditor: Auditing Local Differential Privacy for Correlated Key-Value Estimation
- Jingnan Xu
- Leixia Wang
- Xiaofeng Meng
To protect privacy for data-collection-based services, local differential privacy
(LDP) is widely adopted due to its rigorous theoretical bound on privacy loss. However,
mistakes in complex theoretical analysis or subtle implementation errors may undermine
its practical guarantee. To address this, auditing is crucial to confirm that LDP
protocols truly protect user data. However, existing auditing methods, though, mainly
target machine learning and federated learning tasks based on centralized differentially
privacy (DP), with limited attention to LDP. Moreover, the few studies on LDP auditing
focus solely on simple frequency estimation task for discrete data, leaving correlated
key-value data - which requires both discrete frequency estimation for keys and continuous
mean estimation for values - unexplored.
To bridge this gap, we propose KV-Auditor, a framework for auditing LDP-based key-value
estimation mechanisms by estimating their empirical privacy lower bounds. Rather than
traditional LDP auditing methods that relies on binary output predictions, KV-Auditor
estimates this lower bound by analyzing unbounded output distributions, supporting
continuous data. Specifically, we classify state-of-the-art LDP key-value mechanisms
into interactive and non-interactive types. For non-interactive mechanisms, we propose
horizontal KV-Auditor for small domains with sufficient samples and vertical KV-Auditor
for large domains with limited samples. For interactive mechanisms, we design a segmentation
strategy to capture incremental privacy leakage across iterations. Finally, we perform
extensive experiments to validate the effectiveness of our approach, offering insights
for optimizing LDP-based key-value estimators.
Improving Recommendation Fairness via Graph Structure and Representation Augmentation
- Tongxin Xu
- Wenqiang Liu
- Chenzhong Bin
- Cihan Xiao
- Zhixin Zeng
- Tianlong Gu
Graph Convolutional Networks (GCNs) have become increasingly popular in recommendation
systems. However, recent studies have shown that GCN-based models will cause sensitive
information to disseminate widely in the graph structure, amplifying data bias and
raising fairness concerns. While various fairness methods have been proposed, most
of them neglect the impact of biased data on representation learning, which results
in limited fairness improvement. Moreover, some studies have focused on constructing
fair and balanced data distributions through data augmentation, but these methods
significantly reduce utility due to disruption of user preferences. In this paper,
we aim to design a fair recommendation method from the perspective of data augmentation
to improve fairness while preserving recommendation utility. To achieve fairness-aware
data augmentation with minimal disruption to user preferences, we propose two prior
hypotheses. The first hypothesis identifies sensitive interactions by comparing outcomes
of performance-oriented and fairness-aware recommendations, while the second one focuses
on detecting sensitive features by analyzing feature similarities between biased and
debiased representations. Then, we propose a dual data augmentation framework for
fair recommendation, which includes two data augmentation strategies to generate fair
augmented graphs and feature representations. Furthermore, we introduce a debiasing
learning method that minimizes the dependence between the learned representations
and sensitive information to eliminate bias. Extensive experiments on two real-world
datasets demonstrate the superiority of our proposed framework.
SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting
- Xiongxiao Xu
- Canyu Chen
- Yueqing Liang
- Baixiang Huang
- Guangji Bai
- Liang Zhao
- Kai Shu
Time series forecasting has made significant advances, including with Transformer-based
models. The attention mechanism in Transformer effectively captures temporal dependencies
by attending to all past inputs simultaneously. However, its quadratic computational
complexity with respect to sequence length limits the scalability for long-range modeling.
Recent state space models (SSMs) such as Mamba offer a promising alternative by achieving
linear complexity without attention. Yet, Mamba compresses historical information
into a fixed-size latent state, potentially causing information loss and limiting
representational effectiveness. This raises a key research question: Can we design
a hybrid Mamba-Transformer architecture that is both effective and efficient for time
series forecasting? To address it, we adapt a hybrid Mamba-Transformer architecture
Mambaformer, originally proposed for language modeling, to the time series domain.
Preliminary experiments reveal that naively stacking Mamba and Transformer layers
in Mambaformer is suboptimal for time series forecasting, due to an information interference
problem. To mitigate this issue, we introduce a new time series decomposition strategy
that separates time series into long-range patterns and short-range variations. Then
we show that Mamba excels at capturing long-term structures, while Transformer is
more effective at modeling short-term dynamics. Building on this insight, we propose
State Space Transformer (SST), a multi-scale hybrid model with expert modules: a Mamba
expert for long-range patterns and a Transformer expert for short-term variations.
To facilitate learning the patterns and variations, SST employs a multi-scale patching
mechanism to adaptively adjust time series resolution: low resolution for long-term
patterns and high resolution for short-term variations. Comprehensive experiments
on real-world datasets demonstrate that SST achieves state-of-the-art performance
while scaling linearly with sequence length (O(L)). The code is available on GitHub.
Contrastive Multi-View Graph Hashing
- Yang Xu
- Zuliang Yang
- Kai Ming Ting
Multi-view graph data, which both captures node attributes and rich relational information
from diverse sources, is becoming increasingly prevalent in various domains. The effective
and efficient retrieval of such data is an important task. Although multi-view hashing
techniques have offered a paradigm for fusing diverse information into compact binary
codes, they typically assume attributes-based inputs per view. This makes them unsuitable
for multi-view graph data, where effectively encoding and fusing complex topological
information from multiple heterogeneous graph views to generate unified binary embeddings
remains a significant challenge. In this work, we propose Contrastive Multi-view Graph
Hashing (CMGHash), a novel end-to-end framework designed to learn unified and discriminative
binary embeddings from multi-view graph data. CMGHash learns a consensus node representation
space using a contrastive multi-view graph loss, which aims to pull k-nearest neighbors from all graphs closer while pushing away negative pairs, i.e.,
non-neighbor nodes. Moreover, we impose binarization constraints on this consensus
space, enabling its conversion to a corresponding binary embedding space at minimal
cost. Extensive experiments on several benchmark datasets demonstrate that CMGHash
significantly outperforms existing approaches in terms of retrieval accuracy.
From Policy Comparison to Process Consistency and Beyond
- Yifan Xu
- Yujia Yin
- Yiming Xing
- Yifan Chen
Statistical Policy Comparison (SPC) assesses the equivalence of two stochastic policies
(policy consistency) and has received broad attention. However, the SPC framework
implicitly assumes the invariance of decision environments, and therefore fails to
address a flurry of real-world data science applications. In this work, we refer to
this overlooked issue as environment consistency, and together with policy consistency, this extends to a generalized concept process consistency for systematically comparing policy trials under the Markov decision process (MDP)
framework. To address process consistency, we propose a unified comparison framework,
extending beyond traditional statistical policy comparison studies by incorporating
both policy and environment comparisons. For policy consistency, existing statistical
policy comparison methods can be seamlessly integrated into our intentionally-designed
framework without modification. Specifically for environment consistency (the focus
of this work), we devise fine-grained return tests to capture shifts of key elements in MDPs; notably, under special cases where
trajectory likelihood information is available or can be estimated, we introduce a
trajectory test based on the likelihood ratio test (LRT), offering increased testing power.
Extensive experiments demonstrate that our proposed testing methods achieve higher
statistical power than existing approaches in testing process consistency, establishing
their effectiveness across diverse real-world scenarios. Our code is available at
https://github.com/bcxyf123/MDP-Testing.git.
LGC-CR: Few-shot Knowledge Graph Completion via Local Global Contrastive Learning
and LLM-Guided Refinement
- Yiming Xu
- Qi Song
- Yihan Wang
- Wangqiu Zhou
- Junli Liang
Recent years have witnessed increasing interest in few-shot knowledge graph completion
(FKGC), which aims to infer novel query triples for few-shot relations from limited
references. Despite promising progress, existing methods face two key challenges:
(1) They often overlook rich higher-order neighbors, while traditional high-order
aggregation methods are prone to introducing noise and lack effective alignment across
multi-view neighborhood information. (2) Meta-learning methods over-rely on embeddings,
making them susceptible to spurious relational patterns. Meanwhile, LLM-based methods,
despite their potential, suffer from hallucinations and input constraints. To this
end, we propose a novel framework that combines meta-learning, enhanced via a Local-Global Contrastive network, with LLM-guided Contextual Refinement (LGC-CR). At the data level, we design a local-global contrastive network
to jointly aggregate relevant local features and capture stable global representations
while filtering high-order noise, then align these two views through a dual contrast
module to ensure consistency. At the model level, we employ an LLM refinement module,
which retrieves relevant contexts to construct prompts and applies a knowledge selector
to identify high-quality facts based on diversity and centrality, enabling efficient
fine-tuning of LLMs to refine the preliminary predictions of meta-learning. The experimental
results demonstrate that LGC-CR delivers better and more robust performance than state-of-the-art
baselines, with Hit@1 improvements of 8.1%, 21.7%, and 20.6% on NELL, Wiki, and FB15K,
respectively.
KALE: Knowledge Aggregation for Label-free Model Enhancement
- Yuebin Xu
- Xuemei Peng
- Zhiyi Chen
- Zeyi Wen
Large foundation models have demonstrated remarkable success in natural language processing
and computer vision. Applying the large models to downstream tasks often requires
fine-tuning, in order to boost the predictive accuracy. However, the fine-tuning process
relies heavily on labeled data and extensive training. This dependency makes fine-tuning
impractical for niche applications, such as rare object detection or specialized medical
tasks. To overcome these limitations, we propose KALE: Knowledge Aggregation for Label-free
model Enhancement, a label-free method for model enhancement, leveraging knowledge
aggregation via model fusion and adaptive representation alignment. Our method is
powered by a carefully designed joint self-cooperative optimization function that
considers (i) multi-granularity optimization (task-specific and layer-specific), (ii)
self and cooperative supervision integration, and (iii) mitigation of error accumulation
caused by entropy minimization. Additionally, we introduce a class cardinality-aware
sample filtering to ensure the stability of the fusion process. We also design a lightweight
representation alignment technique to refine the fusion coefficient in a few shots
for quality enhancement. We evaluate our method on multiple image classification datasets
using ViT-B/32 and ViT-L/14 backbones. Experimental results demonstrate that our label-free
method consistently outperforms state-of-the-art unsupervised approaches, including
TURTLE and supervised full fine-tuning, in terms of average performance. Specifically,
compared to TURTLE, our method achieves average improvements of 20.7% with ViT-B/32
and 19.5% with ViT-L/14. Furthermore, on the challenging SUN397 dataset, our method
surpasses supervised full fine-tuning by 4% and 2.3% with ViT-B/32 and ViT-L/14, respectively.
Fine-Grained Graph Rationalization
- Zhe Xu
- Menghai Pan
- Yuzhong Chen
- Huiyuan Chen
- Yuchen Yan
- Mahashweta Das
- Hanghang Tong
Rationale discovery is defined as finding a subset of the input data that maximally
supports the prediction of downstream tasks. In the context of graph machine learning,
graph rationale is defined as identifying the critical subgraph in the given graph
topology. In contrast to the rationale subgraph, the remaining subgraph is named the
environment subgraph. Graph rationalization can enhance the model performance because
the mapping between the graph rationale and the prediction label is viewed as invariant,
by definition. To ensure the discriminative power of the extracted rationale subgraphs,
a key technique named intervention is applied, whose core idea is that given changing
environment subgraphs, the semantics from the rationale subgraph is invariant, which
guarantees the correct prediction result. However, most, if not all, of the existing
graph rationalization methods develop their intervention strategies on the graph level,
which is coarse-grained. In this paper, we propose FIne-grained Graph rationalization
(FIG). Our idea is driven by the self-attention mechanism, which provides rich interactions
between input nodes. Based on that, FIG can achieve node-level and virtual node-level
intervention. Our experiments involve 7 real-world datasets, and the proposed FIG
shows significant performance advantages compared to 13 baseline methods.
Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender
Systems
- Yueqing Xuan
- Kacper Sokol
- Mark Sanderson
- Jeffrey Chan
Recommender systems trained on implicit feedback data rely on negative sampling to
distinguish positive items from negative items for each user. Since the majority of
positive interactions come from a small group of active users, negative samplers are
often impacted by data imbalance, leading them to choose more informative negatives
for prominent users while providing less useful ones for users who are not so active.
This leads to inactive users being further marginalised in the training process, thus
receiving inferior recommendations. In this paper, we conduct a comprehensive empirical
study demonstrating that state-of-the-art negative sampling strategies provide more
accurate recommendations for active users than for inactive users. We also find that
increasing the number of negative samples for each positive item improves the average
performance, but the benefit is distributed unequally across user groups, with active
users experiencing performance gain while inactive users suffering performance degradation.
To address this, we propose a group-specific negative sampling strategy that assigns
smaller negative ratios to inactive user groups and larger ratios to active groups.
Experiments on eight negative samplers show that our approach improves user-side fairness
and performance when compared to a uniform global ratio.
Compensating Information and Capturing Modal Preferences in Multimodal Recommendation:
A Dual-Path Representation Learning Framework
- Cairong Yan
- Xubin Mao
- Zijian Wang
- Xicheng Zhao
- Linlin Meng
In the context of information explosion, multimodal recommender systems (MMRS) have
demonstrated great potential in capturing users' complex preferences and enhancing
recommendation performance by integrating multimodal data such as images and text.
However, multimodal data inherently suffers from semantic inconsistency, which can introduce information conflicts or noise. Moreover, users' reliance on
different modalities varies dynamically with context and time (multimodal dynamic preferences). These challenges may lead to truth deviation and deep semantic mismatch, ultimately
degrading recommendation performance. To address these issues, we propose Dual-Path Multimodal Recommendation (DPRec), a novel model that improves precision and robustness of the recommendation through
cross-modal information compensation and dynamic modal preference learning. Specifically,
DPRec first employs a cross-modal attention mechanism to dynamically model inter-modal
correlations, effectively exploring complementary and shared features for robust user
and item representations. Second, it integrates feature projection, modality alignment,
and dynamic weighting mechanisms to adaptively adjust modality importance based on
user context, ensuring flexibility in handling preference dynamics. Lastly, a modality
contrastive loss is utilized to maximize mutual information between modalities, mitigating
semantic mismatch by enhancing deep collaborative representations. Extensive experiments
on three public datasets show that DPRec consistently outperforms state-of-the-art
(SOTA) methods, achieving average improvements of 3.94% in Recall@20 and 3.84% in
NDCG@20. Our code is publicly available at: https://anonymous.4open.science/r/DPRec-4D15.
FedSTEP: Asynchronous and Staleness-Aware Personalization for Efficient Federated
Learning
Personalized Federated Learning (PFL) aims to provide client-specific models that
adapt to local data distributions while leveraging shared knowledge across clients.
A common design in PFL is the head-representation architecture, which combines a shared
global representation with a local head on each client. Although effective, deploying
this architecture in real-world systems remains challenging due to the presence of
stragglers and the high communication cost. To address these issues, we propose FedSTEP, a unified framework that integrates asynchronous training with dynamic communication
sparsification. Specifically, it adaptively adjusts each client's local training duration
and communication sparsity based on staleness, enabling more efficient coordination
between local adaptation and global representation. This design mitigates the impact
of stragglers and ensures robust performance in heterogeneous environments. We provide
a theoretical analysis of the convergence behavior and communication efficiency of
FedSTEP under standard assumptions. Extensive experiments on five public datasets demonstrate
that FedSTEP consistently outperforms existing methods. It achieves up to 4.65% higher accuracy,
a 3.68× speedup in training, and a 1.91× reduction in communication cost.
Higher-order Structure and Semantics-enhanced User Profiling for Recommendation
- Tan Yanchao
- Xinyi Huang
- Zhijun Chen
- Hang Lv
- Hengyu Zhang
- Wei Huang
- Guofang Ma
Accurate user profiles are crucial for personalized recommendation systems to mitigate
information overload on large-scale online platforms. While recent advances in large
language models have enhanced semantic understanding for profile construction through
textual artifacts, existing methods often neglect the higher-order structural patterns
inherent in user-item interaction graphs-a key limitation for achieving accurate and
diverse recommendations. In this paper, we propose SSPRec, a Higher-order Structure
and Semantics-enhanced User Profiling for Recommendation. Specifically, we first introduce
a multi-hop proximity matrix over item-item transitions, followed by low-rank approximation
and clustering to group users based on behavioral similarity. Group-level user profiles
are then distilled via representative keywords extracted from co-interacted items,
and collaborative embeddings are concurrently learned from the interaction graph.
To integrate collaborative signals with language-based profiles, we introduce a cross-view
contrastive objective that encourages coherence between structural and semantic representations.
Final recommendations are made using a fused user-item similarity score. Extensive
experiments on four real-world datasets show that SSPRec not only outperforms baselines
in accuracy (with 46.35% improvements), but also remains diverse and robust, even
under incomplete interactions.
Ordinal Embedding for Collaborative Filtering: A Unified Regularization for Enhanced
Generalization and Interpretability
- Jie Yang
- Ling Luo
- Nestor Cabello
- Lars Kulik
Collaborative filtering is a primary paradigm of modern recommender systems. A typical
practice is to embed collaborative signals into a latent space and infer recommendation
scores based on the similarities between user and item embeddings. Besides inter-type
similarities (i.e., user-item relationships), intra-type similarities (i.e., user-user,
and item-item) are also essential as they capture the intrinsic structure of users
and items. However, many existing recommendation models only learn inter-type similarities
using objectives like ranking loss or binary classification loss, while neglecting
intra-type similarities. Consequently, the intrinsic structures of users and items
are often distorted in the latent space, where users with similar historical interactions
diverge more than those dissimilar. In this study, we show the importance of preserving
the ordinal relations of intra-type similarities. We provide a theoretical analysis
suggesting that preserving intra-type similarity rankings can enhance a model's generalizability
and interpretability. In addition, we propose a regularization that enforces a constraint
on the rankings of intra-type similarities, ensuring that learning inter-type similarities
does not break intrinsic ordinal structures. It can be seamlessly integrated into
most latent factor models and can be jointly trained with their original objectives.
Extensive experiments on 4 benchmark datasets and 5 representative models show that
our ordinal regularization can consistently improve recommendation performance, and
enhance the intra-type similarity coherence in the latent space. The results also
exhibit enhanced generalizability and interpretability of recommendations.
Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants
- Jiuding Yang
- Hui Liu
- Weidong Guo
- Yu Xu
- Di Niu
Aligning Large Language Models (LLMs) with nuanced user instructions is critical for
their effective deployment in real-world applications. While prior methods focus on
enhancing data diversity and complexity, they often overlook models' sensitivity to
fine-grained variations in semantically similar instructions. To address this, we
introduce DeMoRecon, a data augmentation framework that decomposes complex instructions
into sub-components, modifies individual elements, and reconstructs them into instruction
variants. This method preserves contextual integrity while injecting targeted variability
essential for fine-grained instruction-following. Based on DeMoRecon, we construct
the FGIV dataset, comprising over 1,700 seed instructions and thousands of nuanced
variants designed for both supervised fine-tuning and preference-based alignment.
Experimental results show that LLMs trained with FGIV achieve up to +10.2% improvement
on our fine-grained FGIV-Eval benchmark and up to +8.8% on existing benchmarks such
as FollowBench and InfoBench. These findings highlight the value of FGIV in advancing
instruction sensitivity and robustness in LLMs.
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
- Lishan Yang
- Wei Emma Zhang
- Quan Z. Sheng
- Lina Yao
- Weitong Chen
- Ali Shakeri
In the era of big data, data mining has become indispensable for uncovering hidden
patterns and insights from vast and complex datasets. The integration of multimodal
data sources further enhances its potential. Multimodal Federated Learning (MFL) is
a distributed approach that enhances the efficiency and quality of multimodal learning,
ensuring collaborative work and privacy protection. However, missing modalities pose
a significant challenge in MFL, often due to data quality issues or privacy policies
across the clients. In this work, we present MMiC, a framework for Mitigating Modality
incompleteness in MFL within the Clusters. MMiC replaces partial parameters within
client models inside clusters to mitigate the impact of missing modalities. Furthermore,
it leverages the Banzhaf Power Index to optimize client selection under these conditions.
Finally, MMiC employs an innovative approach to dynamically control global aggregation
by utilizing Markowitz Portfolio Optimization. Extensive experiments demonstrate that
MMiC consistently outperforms existing federated learning architectures in both global
and personalized performance on multimodal datasets with missing modalities, confirming
the effectiveness of our proposed solution. Our code is available at https://github.com/gotobcn8/MMiC.
TCPN: Temporal Pyramidal Recurrent Network with Contrastive Learning for Temporal
Knowledge Graph Reasoning
- Liu Yang
- Zixuan Luo
- Tingxuan Chen
- Zidong Wang
- Limin Liu
Temporal Knowledge Graphs (TKGs) serve as crucial tools for representing dynamic changes
in the real world. Extrapolation reasoning within TKGs aims to predict entirely unknown
future facts based on limited historical data, offering considerable practical value
across various fields. However, existing methods generally focus on the recurrence
and periodicity of historical facts, while overlooking the dynamic interactions associated
with future facts. Moreover, these methods fail to capture historical evolutionary
patterns, which grow increasingly complex as historical data accumulates. To this
end, we propose TCPN, a novel Temporal Pyramidal Recurrent Network with contrastive
learning for TKG extrapolation reasoning. Specifically, TCPN leverages a temporal
pyramidal recurrent network to capture historical dependencies across multiple temporal
scales, thereby refining temporal feature representations over extended time spans.
Furthermore, TCPN seamlessly integrates contrastive learning to effectively align
historical information with query semantics relevant to future facts. Lastly, we incorporate
an adaptive time-aware mechanism, which uniformly models long-short term dependencies
in time series with different granularities, explicitly fusing temporal feature information.
Extensive experiments on four widely used TKG datasets show that TCPN significantly
outperforms state-of-the-art methods across all metrics.
Unplug and Play Language Models: Decomposing Experts in Language Models at Inference
Time
- Nakyeong Yang
- Jiwon Moon
- Junseok Kim
- Yunah Jang
- Kyomin Jung
Enabled by large-scale text corpora with huge parameters, pre-trained language models
operate as multi-task experts using a single model architecture. However, recent studies
have revealed that certain neurons play disproportionately important roles in solving
specific tasks, suggesting that task-relevant substructures can be isolated and selectively
activated for each task. Therefore, we introduce Decomposition of Experts (DoE), a
novel framework that dynamically identifies and activates task-specific experts within
a language model to reduce inference cost without sacrificing accuracy. We first define
a task expert as a set of parameters that significantly influence the performance
of a specific task and propose a four-step unplug-and-play process: (1) receiving
a user request, (2) identifying the corresponding task expert, (3) performing inference
using the expert-localized model, and (4) restoring the original model and waiting
for the next task. Using attribution methods and prompt tuning, DoE isolates task-relevant
neurons, minimizing computational overhead while maintaining task performance. We
assume a setting where a language model receives user requests from five widely used
natural language understanding benchmarks, processing one task at a time. In this
setup, we demonstrate that DoE achieves up to a x1.73 inference speed-up with a 65%
pruning rate, without compromising accuracy. Comparisons with various task expert
localization methods reveal that DoE effectively identifies task experts, while ablation
studies validate the importance of its components. Additionally, we analyze the effects
of batch size, token count, and layer types on inference speed-up, providing practical
insights for adopting DoE. The proposed framework is both practical and scalable,
applicable to any transformer-based architecture, offering a robust solution for efficient
task-specific inference.
StreamingRT: Stream KNN Join with Ray Tracing Core
- Shixi Yang
- Kai Zhang
- Zhigang Zhao
- Chunxiao Wang
- Zhenying He
- Yinan Jing
- X. Sean Wang
Efficient processing of k-nearest neighbor (kNN) join operations on streaming data
is critical for applications in location-aware services, recommendation systems, and
spatial analytics. To serve users in real time, these applications generally require
a high-performance kNN join on continuously changing streaming data. This paper introduces
StreamingRT, a framework that leverages ray tracing (RT) cores in GPUs to accelerate
stream kNN joins in 3D space. By modeling stream data into large primitives and transferring
queries into short rays, StreamingRT transforms the kNN join problem into an efficient
ray tracing task. To address the ray tracing index updating overhead on stream data,
we propose two key techniques, i.e., boundary-extended point partitioning and query-driven
BVH lazy updating. Moreover, we also adopt multi-BVH coprocessing and CPU-GPU pipelining
to improve performance. These techniques enable efficient stream kNN join on ray tracing
cores, delivering unprecedented performance improvement. Experimental evaluations
show that StreamingRT can achieve up to 2.2× and 5.8× speedup over the state-of-the-art
approach on RT cores and CUDA cores, respectively.
STEP: Stepwise Curriculum Learning for Context-Knowledge Fusion in Conversational
Recommendation
- Zhenye Yang
- Jinpeng Chen
- Huan Li
- Xiongnan Jin
- Xuanyang Li
- Junwei Zhang
- Hongbo Gao
- Kaimin Wei
- Senzhang Wang
Conversational recommender systems (CRSs) aim to proactively capture user preferences
through natural language dialogue and recommend high-quality items. To achieve this,
CRS gathers user preferences via a dialog module and builds user profiles through
a recommendation module to generate appropriate recommendations. However, existing
CRS faces challenges in capturing the deep semantics of user preferences and dialogue
context. In particular, the efficient integration of external knowledge graph (KG)
information into dialogue generation and recommendation remains a pressing issue.
Traditional approaches typically combine KG information directly with dialogue content,
which often struggles with complex semantic relationships, resulting in recommendations
that may not align with user expectations.
To address these challenges, we introduce STEP, a conversational recommender centered
on pre-trained language models that combines curriculum-guided context-knowledge fusion
with lightweight task-specific prompt tuning. At its heart, an F-Former progressively
aligns the dialogue context with knowledge-graph entities through a three-stage curriculum,
thus resolving fine-grained semantic mismatches. The fused representation is then
injected into the frozen language model via two minimal yet adaptive prefix prompts:
a conversation prefix that steers response generation toward user intent and a recommendation
prefix that biases item ranking toward knowledge-consistent candidates. This dual-prompt
scheme allows the model to share cross-task semantics while respecting the distinct
objectives of dialogue and recommendation. Experimental results show that STEP outperforms
mainstream methods in the precision of recommendation and dialogue quality in two
public datasets. Our code is available: https://github.com/Alex-bupt/STEP.
GFlowNet with Gradient-based Optimization for Bayesian Network Structure Learning
- Zhu Yang
- Kun Yue
- Zhiwei Qi
- Liang Duan
- Jianyu Li
Bayesian network (BN) structure learning on the discrete observations is crucial for
representing uncertainty in data. However, existing single-structure learning methods
are commonly trapped in local optimum or yielding structures that may lead to poorly
calibrated predictions. Although posterior approximation methods can quantify the
epistemic uncertainty over the learned BN structures, they cannot reliably identify
the optimal BN structures. To tackle these issues, we propose a GFlowNet with Gradient-based
Optimization (GFlowOpt) for the BN structure learning method. Initially, we employ
Bayesian Information Criterion (BIC) scores as the rewards of BN structures, and adopt
a contrastive learning-based training technique on GFlowNet to efficiently generate
much better DAG representations. Subsequently, we train a proxy model on the continuous
DAG representations by adopting the gradient-ascent-based optimization method to search
more high proxy-scoring discrete candidate DAGs. Finally, we adopt Hill Climbing (HC)
on these candidate DAGs to search high-scoring DAGs as the ultimate BN structures.
Extensive experiments conducted on benchmark datasets demonstrate that the superiority
of our proposed method compared with other state-of-the-art methods.
PAnDA: Combating Negative Augmentation via Large Language Models for User Cold-Start
Recommendations
- Yantong Du
- Rui Chen
- Xiangyu Zhao
- Qilong Han
- A. K. Qin
The cold-start problem remains a long-standing challenge in recommender systems. Recent
advances in large language models (LLMs) have opened new avenues for addressing cold-start
scenarios through data augmentation. However, existing cold-start augmentation methods
often suffer from negative augmentation, manifesting as incomplete augmentation, where generated interactions fail to comprehensively reflect user preferences, and
inaccurate augmentation, where they conflict with user intent. These issues largely stem from two limitations:
(1) the inability to effectively incorporate collaborative signals, which are critical
for preference alignment, and (2) the lack of awareness of the downstream model's
learning dynamics during data augmentation. To the best of our knowledge, the latter
has not been studied in the literature.
Consequently, we propose a novel framework named PAnDA. To address the incomplete
augmentation issue, we propose a model-agnostic preference-aligned augmentation module
to iteratively extract and fuse textual information and collaborative information
by user-user preference matching and user-item preference coherence, which together
form a contextual cue to guide the augmentor to generate high-quality augmented data.
To overcome the inaccurate augmentation issue, we propose a model-specific downstream-model-aware
adaptation module to adaptively align the augmented data with the model's states during
the training process, guided by gradient similarity. Extensive experiments on three
public benchmark datasets demonstrate that PAnDA outperforms different groups of state-of-the-art
cold-start recommendation methods in all scenarios. The source code is publicly available
at https://github.com/YantongDU/PAnDA.
An Embarrassingly Simple but Effective Knowledge-enhanced Recommender
- Haibo Ye
- Lijun Zhang
- Yuan Yao
- XinJie Li
Knowledge graphs (KG) have demonstrated significant potential in recommender systems
by providing complementary semantic information that is typically absent in user-item
interaction graphs (IG). While contrastive learning has emerged as a powerful paradigm
for integrating these dual information sources, we identify a critical limitation
in existing approaches: current methods fail to effectively balance the contrastive
views derived from IG and KG, often resulting in performance degradation compared
to using IG alone. To address this fundamental challenge, we propose SimKGCL, a novel
contrastive learning framework that introduces a simple yet principled solution --
cross-view, layer-wise fusion between IG and KG representations prior to contrastive
learning. This design ensures effective knowledge transfer while maintaining the discriminative
power of contrastive objectives. Comprehensive experiments across three real-world
benchmarks demonstrate that our approach not only consistently outperforms existing
methods but also achieves remarkable efficiency gains. Our code is available through
this link: https://figshare.com/articles/conference_contribution/SimKGCL/22783382.
SG-Filter: Enhancing Similar Text Retrieval via Hierarchical Summarized-Semantic Index
and Adaptive Filtering
- Jiancai Ye
- Jun Liu
- Haoyu Zhang
- Maojia Sheng
- Tao Yang
- Jiaming Xu
- Jinhao Li
- Yu Wang
- Guohao Dai
Similar Text Retrieval (STR) is an essential scenario in the field of information retrieval (IR).
Unfortunately, existing mainstream vector-based retrieval methods cannot meet the
recall rate requirements in STR scenarios (with a recall rate of less than 72%). This
is because existing works have solely focused on the local information of text segments, that is, the text segments themselves ( i.e., semantic information ) and the relationships between them ( i.e., structured information ). Our key insight is that utilizing the global information of text segments ( i.e., summarized information ~. It includes the key expression of the documents to which the text segments belong
and the relationship between documents. ) is crucial for improving the recall rate
in STR, because the distinction of summarized information helps to filter out confusing
vectors during retrieval. However, existing methods using summarized info still have
a critical challenge. Their vectorization-based approaches fail to effectively model
the global relationship in the summarized information, resulting in a further 79%
deterioration in recall rate.
To address this challenges, we present the SG-Filter, a novel retrieval framework that integrates summarized information by designing
the hierarchical summarized-semantic index and the adaptive filtering strategy applied on it. (1) We propose a hierarchical summarized-semantic index by designing a summarized graph to model the summarized information. Specifically, we exploit the global information
at both document and text segment levels through co-occurrence relationships and semantic
associations. (2) We propose an adaptive filtering strategy that automatically determines which summarized
words to filter per retrieval for effectively utilization of summarized information.
(3) To ensure robustness and low retrieval latency, we propose a multi-path merge recall
strategy to obtain summarized and semantic information at varying proportions, and
develop an efficient vector retrieval method with filtering conditions. Experiments
show that SG-Filter significantly increases recall rate by 10.53% ~ 22.92% on average compared with existing vector-based retrieval methods in STR. SG-Filter
also ensures retrieval latency remains within tens of milliseconds. The code is open-sourced in https://github.com/strong-leaf/SG-Filter
Towards Instance-wise Personalized Federated Learning via Semi-Implicit Bayesian Prompt
Tuning
- Tiandi Ye
- Wenyan Liu
- Kai Yao
- Lichun Li
- Shangchao Su
- Cen Chen
- Xiang Li
- Shan Yin
- Ming Gao
Federated learning (FL) is a privacy-preserving machine learning paradigm that enables
collaborative model training across multiple distributed clients without disclosing
their raw data. Personalized federated learning (pFL) has gained increasing attention
for its ability to address data heterogeneity. However, most existing pFL methods
assume that each client's data follows a single distribution and learn one client-level
personalized model for each client. This assumption often fails in practice, where
a single client may possess data from multiple sources or domains, resulting in significant
intra-client heterogeneity and suboptimal performance. To tackle this challenge, we
propose pFedBayesPT, a fine-grained instance-wise pFL framework based on visual prompt
tuning. Specifically, we formulate instance-wise prompt generation from a Bayesian
perspective and model the prompt posterior as an implicit distribution to capture
diverse visual semantics. We derive a variational training objective under the semi-implicit
variational inference framework. Extensive experiments on benchmark datasets demonstrate
that pFedBayesPT consistently outperforms existing pFL methods under both feature
and label heterogeneity settings.
Exploring Iterative Refinement for Nested Named Entity Recognition with IoU-aware
Denoising Diffusion
- Qiaoxuan Yin
- Jianquan Ouyang
- Huanrong Tang
Named entity recognition (NER) is a key task in natural language processing, but existing
methods often fail to effectively handle nested structures due to fuzzy entity boundaries
and structural ambiguity. To address this challenge, we propose a novel nested NER
method based on an IoU-aware denoising diffusion model, which formulates the nested
NER task as a generative denoising process that progressively recovers gold entity
spans from noisy span proposals. We generate noisy samples during training by gradually
adding Gaussian noise to the ground-truth entity boundaries. We then train a denoiser
incorporating a top-k selective attention mechanism to refine entity span proposals
iteratively. To strengthen the alignment between boundary localization and entity
classification, we introduce an IoU-aware loss function that optimizes the overlap
between predicted and ground-truth spans. This design more accurately guides boundary
regression and effectively reduces misalignment caused by conventional regression
losses. Our model leverages sentence features and timesteps as conditional inputs
to capture contextual information throughout the denoising process. During inference,
the model generates final entity predictions by starting from random noise spans and
iteratively refining them through a multi-step reverse diffusion process. We conduct
extensive experiments on four nested NER datasets, ACE2004, ACE2005, GENIA, and KBP2017,
as well as two flat NER datasets, CoNLL2003 and OntoNotes. Experimental results show
that the proposed method consistently outperforms existing advanced models across
all benchmarks, demonstrating its effectiveness.
Multi-Armed Bandits with Biased and Heteroscedastic Auxiliary Rewards
We study the multi-armed bandits with auxiliary rewards problem, in which pulling
an arm yields not only a primary reward but also a set of auxiliary rewards, which
represents some low-quality data. The auxiliary reward distribution can be biased
and have higher variances than the primary reward distribution. We analyze the regret
lower bound with general-order cumulative volume function, and deduce the conditions
under which an algorithm can outperform the classical optimal regret bound without
auxiliary rewards, attained by the state-of-the-art (SOTA) algorithm Asymptotically-Optimal-UCB
(AO-UCB). Then we propose the BVA-MIN-UCB algorithm, which carefully incorporates
the primary and auxiliary rewards by adjusting the potential biases and different
variances. We show that BVA-MIN-UCB always performs no worse than AO-UCB asymptotically,
and nearly matches the regret lower bound, even up to a constant factor. Finally,
we conduct numerical experiments to demonstrate the effectiveness of our algorithm.
DPT: Dynamic Preference Transfer for Cross-Domain Sequential Recommendation
- Xiang Ying
- Rui Ding
- Yue Zhao
- Mei Yu
- Mankun Zhao
Cross-domain sequential recommendation aims to generate accurate recommendations by
leveraging users' historical interactions across domains. However, existing methods
have two limitations: 1) When transferring user's preferences from the source domain,
they encode preferences into a static and holistic representation, ignoring the rich
information inherent in the dynamic evolution of user preferences over time; 2) They
adopt a distribution-agnostic full-transfer strategy, failing to effectively limit
the transfer degree of source-domain preferences according to different data distributions,
which poses a risk of negative transfer. To address these issues, we propose the Dynamic
Preference Transfer (DPT) model. Unlike existing methods, DPT places greater emphasis
on the dynamic transfer of real-time preferences. First, DPT captures the causal features
through the causal self-attention mechanism, and then realizes dynamic preference
transfer at each time step via the causal cross-attention mechanism, thereby tracking
the temporal dynamics of preferences from source domains. Second, to mitigate the
negative transfer issue, a temperature-controlled mechanism is designed to adaptively
balance source and target domain preferences, leveraging a temperature-controlled
sigmoid function to effectively suppress interference from irrelevant preferences.
Experimental results on multiple benchmark datasets show that the proposed method
achieves significant performance improvements compared with the state-of-the-art (SOTA)
methods, verifying its effectiveness and superiority. The codes are available in https://github.com/iryand/DPT.
FAIR-SE: Framework for Analyzing Information Disparities in Search Engines with Diverse
LLM-Generated Personas
- Jaebeom You
- Seung-Kyu Hong
- Ling Liu
- Kisung Lee
- Hyuk-Yoon Kwon
Search engine personalization, while enhancing user satisfaction, can lead to information
disparities. Previous studies on this topic face limitations, such as the absence
of context-aware data collection, superficial URL-level analysis, and human-dependent
annotations. We propose FAIR-SE, a Framework for Analyzing Information dispaRities
in Search Engines that addresses these challenges through AWS Lambda-based concurrent
data collection and LLM-generated persona-based content analysis. We collected search
results across four user contexts (Search History, Geo-location, Language Preference,
and Access Environment) and analyzed them through four analytical perspectives (Political
Leaning, Topic-specific Stance, Subjectivity, and Bias). Experiments conducted on
two globally prominent search engines across nine controversial topics demonstrate
the efficacy of FAIR-SE regarding benchmark accuracy, persona consistency, and ability
to reflect real-world discourse patterns across diverse topics. Our statistical analysis
identifies distinct search engine characteristics and demonstrates significant information
disparities in our case studies examining regional disparities in search results.
Our code and datasets are publicly available at: https://github.com/bigbases/FAIR-SE.
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and
Empirical Study
- Shuo Yu
- Mingyue Cheng
- Qi Liu
- Daoyu Wang
- Jiqian Yang
- Jie Ouyang
- Yucong Luo
- Chenyi Lei
- Enhong Chen
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach
to mitigating the hallucination of large language models (LLMs) through the integration
of external knowledge. While numerous efforts, most studies focus on a single type
of external knowledge source. However, in real-world applications, most situations
involve diverse knowledge from various sources, yet this area has been less explored.
The main dilemma is the lack of a suitable dataset containing multiple knowledge sources
and pre-exploration of the associated issues. To address these challenges, we standardize
a benchmark dataset that combines structured and unstructured knowledge across diverse
and complementary domains. Based on this dataset, we further develop a plug-and-play
RAG framework, PruningRAG, whose main characteristic is the use of multi-granularity pruning strategies to optimize
the integration of relevant information while minimizing misleading context. It consistently
improves performance across various existing RAG variants, demonstrating its robustness
and broad applicability. Building upon the standardized dataset and PruningRAG, we
also report a series of experimental results, as well as insightful findings. Our
dataset and code are publicly available. https://github.com/USTCAGI/PruningRAG, with
the aim of advancing future research in the RAG community.
Decoder-only Pre-training Enhancement for Spatio-temporal Traffic Forecasting
- Tao Yu
- Junhong Wan
- Yao Fu
- Weihao Jiang
- Jiang Zhu
Although spatio-temporal graph neural networks (STGNNs) become widely used methods
in traffic forecasting, they still encounter an issue named short-sightedness. Specifically,
due to high model complexity and GPU memory usage, STGNNs are restricted to processing
only very short input time series. This limited context often causes STGNNs to focus
on local variations and overlook long-term patterns, leading to misinterpretation
of time series trends. To tackle this issue, recent studies propose to perform mask
reconstruction pre-training on traffic series to enhance STGNNs. However, we argue
that mask reconstruction is a suboptimal pre-training paradigm for traffic forecasting,
because there exists a great gap between pre-training and downstream forecasting,
caused by their inconsistent training targets. To eliminate this gap, we propose a
new pre-training paradigm named next patch prediction and prove its advantages from both empirical and theoretical perspectives. Based
on this paradigm, we introduce a new framework called Decoder-only Pre-training Enhancement (DoP) to unleash the potential of traffic pre-training model.
Specifically, DoP uses Transformer decoders as infrastructure, and leverages next
patch prediction as target to conduct pre-training. In addition, we propose a new
dual-view temporal embedding to fully capture temporal information and spatial spectral
enhancement to model spatial information. After pre-training, DoP enhances existing
STGNNs seamlessly with periodic enhancement mechanism. On four real-world traffic
benchmarks, we demonstrate its start-of-the-art performance.
StepTool: Enhancing Multi-Step Tool Usage in LLMs via Step-Grained Reinforcement Learning
- Yuanqing Yu
- Zhefan Wang
- Weizhi Ma
- Shuai Wang
- Chuhan Wu
- Zhiqiang Guo
- Min Zhang
Despite their powerful text generation capabilities, large language models (LLMs)
still struggle to effectively utilize external tools to solve complex tasks, a challenge
known as tool learning. Existing methods primarily rely on supervised fine-tuning,
treating tool learning as a text generation problem while overlooking the decision-making
complexities inherent in multi-step contexts. In this work, we propose modeling tool
learning as a dynamic decision-making process and introduce StepTool, a novel step-grained reinforcement learning framework that enhances LLMs' capabilities
in multi-step tool use. StepTool comprises two key components: Step-grained Reward Shaping, which assigns rewards to each tool interaction based on its invocation success and
contribution to task completion; and Step-grained Optimization, which applies policy gradient methods to optimize the model across multiple decision
steps. Extensive experiments across diverse benchmarks show that StepTool consistently
outperforms both SFT-based and RL-based baselines in terms of task Pass Rate and Recall of relevant tools. Furthermore, our analysis suggests that StepTool helps models
discover new tool-use strategies rather than merely re-weighting prior knowledge.
These results highlight the importance of fine-grained decision modeling in tool learning
and establish StepTool as a general and robust solution for enhancing multi-step tool
use in LLMs. Code and data are available at https://github.com/yuyq18/StepTool.
Temporal Blocks with Memory Replay for Dynamic Graph Representation Learning
- Zhigang Yu
- Hao Yan
- Ruochen Liu
- Xianghan Wang
- Haijun Zhang
- Senzhang Wang
Dynamic graph representation learning (DGRL) aims to model the temporal evolution
of graph structure and attributes, thereby generating low-dimensional node representations
at different time steps. Most prevailing snapshot-based methods construct snapshots
independently in time, assigning each interaction to a single snapshot. However, such
a design limits the ability to capture long-range temporal patterns, leading to the
forgetting of prior interactions and reducing the capacity of the model to recognize
causal dependencies across events. To address this issue, we construct temporal blocks
with the memory replay mechanism by sequentially merging several adjacent snapshots
to capture long-range temporal patterns and causal dependencies over time. Building
on this, we propose a novel dynamic graph representation learning model named TBD.
Specifically, the model first encodes each temporal block using a graph neural network
(GNN), and then captures cross-block dynamics through a Multi-Feature Gated Recurrent
Unit (MF-GRU) that incorporates structural embeddings and a feature-aware gating mechanism
to adapt to evolving graph structures. Furthermore, we introduce a Structure-Aware
Node Smoothness Constraint (SA-NSC) to enforce temporal consistency while retaining
adaptability to structural changes. Extensive experiments on multiple real-world datasets
demonstrate that TBD consistently achieves superior performance, validating its effectiveness
and robustness.
PKGRec: Personal Knowledge Graph Construction and Mining for Federated Recommendation
Enhancement
- Haochen Yuan
- Yang Zhang
- Quan Z. Sheng
- Lina Yao
- Yipeng Zhou
- Xiang He
- Zhongjie Wang
Personal Knowledge Graphs (PKGs) organize an individual user's information into a
structured format comprising entities, attributes, and relationships. By leveraging
this structured and semantically rich data, PKGs have become essential for securing
personal data management and delivering personalized services. To unlock their potential
in personalized recommendations, prior research has explored the construction of PKGs
and recommendation methods built upon them. However, these studies often overlook
challenges associated with distributed PKGs across different users, such as joint
training and privacy protection. To address these challenges, we propose PKGRec, a
federated graph recommendation method specifically designed for PKGs, which utilizes
a federated learning framework to ensure user privacy and data security during joint
learning. Furthermore, to accommodate the user-centric graph structure of PKGs, our
approach categorizes entities into three types: users, items, and other entities.
It then applies a novel staged graph convolution method to model various entities
based on these entity categories during local training. To enable efficient graph
information sharing among distributed PKGs without requiring additional data transfer
or aggregation, PKGRec performs graph expansion on the trained gradients by federated
aggregation. Extensive experiments conducted on four publicly available datasets demonstrate
that our method consistently outperforms the existing federated recommendation approaches.
SEF-UQR: Scalable and Efficient Privacy-Preserving Federated Updating QR Factorization
- Haonan Yuan
- Wenyuan Wu
- Jingwei Chen
Applications in real-time machine learning and data analysis often require incremental
updates to matrix decompositions as new data arrive. This capability is particularly
crucial for streaming PCA, online learning, and iterative optimization algorithms,
where data are continuously generated from distributed sources. However, privacy constraints
prevent direct data sharing among participants, making collaborative QR decomposition
updates challenging. To address this, we present SEF-UQR, a scalable and efficient
framework for federated QR updates that focuses on incremental row-addition updates-common
in streaming-data scenarios-while leveraging homomorphic encryption and interactive
ciphertext protocols to protect both inputs and intermediate computations. SEF-UQR
achieves accuracy on par with insecure recomputation, maintaining a mean squared error
(MSE) below 1e-12. Empirical results demonstrate that SEF-UQR delivers at least a
10× runtime improvement over existing state-of-the-art methods employing fully homomorphic
encryption, confirming its effectiveness for privacy-sensitive, real-time federated
data analysis.
Aggregated Gradients-based Adaptive Learning Rate Design in Federated Learning
Federated Learning (FL) has emerged as a crucial distributed training paradigm, enabling discrete devices
to collaboratively train a shared model while leveraging their locally stored private
data. However, the non-independent-and-identically-distributed (Non-IID) data on heterogeneous
clients may significantly impede training efficacy. In our study, we present a novel
algorithm designed to alleviate client drifting on Non-IID data and enhance model
performance, termed FedAgile (Aggregated Gradients-based AdaptIve LEarning Rate Design in FEDerated Learning),
which designs the adaptive learning rate by introducing an aggregated gradient term
to accelerate model convergence and mean-field terms to approximate the average local
information over time. We refine the learning rate based on Jensen-Shannon (JS) Distance to enhance the generalization capability. Through rigorous theoretical analysis,
we establish the existence and convergence analysis of the mean-field terms, which
can be efficiently calculated via our proposed iterative algorithm with linear computational
complexity. Further, we provide a robust upper bound on the convergence of FedAgile and prove that our algorithm achieves the linear convergence rate of Õ(T-1 ). The extensive experimental results on real-world datasets substantiate the superiority
of our proposed FedAgile in comparison with existing state-of-the-art FL strategies, which can be easily incorporated
with existing methods to further enhance model performance.
TriSeRec: A Tri-view Representation Learning Framework for Sequential/Session-based
Recommendation
Sequential/session-based recommendation models aim to learn evolving user preferences
from historical user behaviors. State-of-the-art sequential/session-based recommendation
models often use graph neural networks or self-attention as their building blocks.
Graph neural networks excel at learning local patterns encoded in graph-structured
data and have therefore shown great performance on session-based recommendation datasets,
where user interactions are usually relatively short. Self-attentive models, on the
other hand, are much more powerful in capturing long-range dependencies and are able
to outperform graph neural network-based approaches on sequential recommendation,
where longer user interactions are more frequent. As such, the recommender systems
community has noted a lack of a unified framework that can simultaneously achieve
great performance on both sequential and session-based recommendation. In an effort
to fill this gap, this paper presents TriSeRec, a Tri-view representation learning
framework for Sequential/session-based Recommendation. By converting interaction sequences
into two graphical views and one sequential view, three view-specific user representations
are learned by TriSeRec using graph neural networks and self-attention. The tri-view
representation learning module, which is built upon the recently proposed generalized
Cauchy-Schwarz divergence, disentangles and then fuses consistent and complementary
information in all three views to form the final user representations for next-item
predictions. Experiments on popular large-scale, real-world benchmark datasets show
that TriSeRec achieves state-of-the-art performance on both sequential recommendation
and session-based recommendation.
A Comparative Analysis of Linguistic and Retrieval Diversity in LLM-Generated Search
Queries
- Oleg Zendel
- Sara Fahad Dawood Al Lawati
- Lida Rashidi
- Falk Scholer
- Mark Sanderson
Large Language Models (LLMs) are increasingly used to generate search queries for
various Information Retrieval (IR) tasks. However, it remains unclear how these machine-generated
queries compare to human-written ones, particularly in terms of diversity and alignment
with real user behavior. This paper presents an empirical comparison of LLM- and human-generated
queries across multiple dimensions, including lexical diversity, linguistic variation,
and retrieval effectiveness. We analyze queries produced by several LLMs and compare
them with human queries from two datasets collected five years apart. Our findings
show that while LLMs can generate diverse queries, their patterns differ from those
observed in human behavior. LLM queries typically exhibit higher surface-level uniqueness
but rely less on stopword use and word form variation. They also achieve lower retrieval
effectiveness when judged against human queries, suggesting that LLM-generated queries
may not always reflect real user intent. These differences highlight the limitations
of current LLMs in replicating natural querying behavior. We discuss the implications
of these findings for LLM-based query generation and user behavior simulation in IR.
We conclude that while LLMs hold potential, they should be used with caution.
Querier-Aware LLM: Generating Personalized Responses to the Same Query from Different
Queriers
- Hang Zeng
- Chaoyue Niu
- Fan Wu
- Chengfei Lv
- Guihai Chen
Existing work on large language model (LLM) personalization assigned different responding
roles to LLMs, but overlooked the diversity of queriers. In this work, we propose
a new form of querier-aware LLM personalization, generating different responses even
for the same query from different queriers. We design a dual-tower model architecture
with a cross-querier general encoder and a querier-specific encoder. We further apply
contrastive learning with multi-view augmentation, pulling close the dialogue representations
of the same querier, while pulling apart those of different queriers. To mitigate
the impact of query diversity on querier-contrastive learning, we cluster the dialogues
based on query similarity and restrict the scope of contrastive learning within each
cluster. To address the lack of datasets designed for querier-aware personalization,
we also build a multi-querier dataset from English and Chinese scripts, as well as
WeChat records, called MQDialog, containing 173 queriers and 12 responders. Extensive
evaluations demonstrate that our design significantly improves the quality of personalized
response generation, achieving relative improvement of 8.4% to 48.7% in ROUGE-L scores
and winning rates ranging from 54% to 82% compared with various baseline methods.
CityLight: A Neighborhood-inclusive Universal Model for Coordinated City-scale Traffic
Signal Control
- Jinwei Zeng
- Chao Yu
- Xinyi Yang
- Wenxuan Ao
- Qianyue Hao
- Jian Yuan
- Yong Li
- Yu Wang
- Huazhong Yang
City-scale traffic signal control (TSC) involves thousands of heterogeneous intersections
with varying topologies, making cooperative decision-making across intersections particularly
challenging. Given the prohibitive computational cost of learning individual policies
for each intersection, some researchers explore learning a universal policy to control
each intersection in a decentralized manner, where the key challenge is to construct
a universal representation method for heterogeneous intersections. However, existing
methods are limited to universally representing information of heterogeneous ego intersections,
neglecting the essential representation of influence from their heterogeneous neighbors.
Universally incorporating neighborhood information is nontrivial due to the intrinsic
complexity of traffic flow interactions, as well as the challenge of modeling collective
influences from neighbor intersections. To address these challenges, we propose CityLight,
which learns a universal policy based on representations obtained with two major modules:
a Neighbor Influence Encoder to explicitly model neighbor's influence with specified
traffic flow relation and connectivity to the ego intersection; a Neighbor Influence
Aggregator to attentively aggregate the influence of neighbors based on their mutual
competitive relations. Extensive experiments on five city-scale datasets, ranging
from 97 to 13,952 intersections, confirm the efficacy of CityLight, with an average
throughput improvement of 11.68% and a lift of 22.59% for generalization. Our codes
and datasets are released: https://github.com/tsinghua-fib-lab/CityLight.
SGPT: Few-Shot Prompt Tuning for Signed Graphs
- Zian Zhai
- Qing Sima
- Xiaoyang Wang
- Wenjie Zhang
Signed Graph Neural Networks (SGNNs) are effective in learning expressive representations
for signed graphs but typically require substantial task-specific labels, limiting
their applicability in label-scarce industrial scenarios. In contrast, unsigned graph
structures are abundant and can be readily leveraged to pre-train Graph Neural Networks
(GNNs), offering a promising solution to reduce supervision requirements in downstream
signed graph tasks. However, transferring knowledge from unsigned to signed graphs
is non-trivial due to the fundamental discrepancies in graph types and task objectives
between pre-training and downstream phases. To address this challenge, we propose
Signed Graph Prompt Tuning (SGPT), a novel graph prompting framework that adapts pre-trained
unsigned GNNs to few-shot signed graph tasks. We first design a graph template based
on balance theory to disentangle mixed node relationships introduced by negative links,
mitigating the structural mismatches between unsigned and signed graphs. We further
introduce a task template that reformulates downstream signed tasks into a unified
link prediction objective, aligning their optimization goals with the pre-training
task. Furthermore, we develop feature prompts that align downstream semantic spaces
with the feature spaces learned during pre-training, and semantic prompts to integrate
link sign semantics in a task-aware manner. We conduct extensive experiments on seven
benchmark signed graph datasets, demonstrating that SGPT significantly outperforms
existing state-of-the-art methods, establishing a powerful and generalizable solution
for few-shot signed graph learning.
DSETA: Driving Style-Aware Estimated Time of Arrival
The accurate estimated time of arrival (ETA) is crucial for mobility and transportation
applications. Although significant efforts have been made to improve ETA prediction,
most existing approaches ignore the influence of individual driving habits and preferences,
known as the driving style. Since different drivers may prefer specific routes and
speeds based on their experience and familiarity with traffic conditions, driving
styles play a crucial role in determining the actual ETA. To fill this gap, we present
a novel approach, DSETA, which leverages deep learning to learn and then integrate
driving style representations for personalized and precise ETA predictions. Our method
employs a diffusion model that captures nuanced driving styles by generating driving
speed distribution. We also utilize attention mechanisms to dynamically adjust the
impacts of various spatio-temporal factors and driving styles on ETA predictions.
Additionally, we introduce a Multi-View Multi-Task framework that incorporates auxiliary
tasks, including segment-view driving style classification and route-view speed distribution
prediction, to enhance the ETA learning process. A route-level speed prior regularization
strategy further improves the model's generalization capabilities. Extensive experiments
conducted on a large real-world trip trajectory dataset demonstrate that DSETA achieves
high effectiveness and outperforms various baselines across multiple evaluation metrics.
A Hierarchical Structure-Enhanced Personalized Recommendation Model for Traditional
Chinese Medicine Formulas Based on KG Diffusion Guidance
Artificial intelligence (AI) technology plays a crucial role in recommending prescriptions
for traditional Chinese medicine (TCM). Previous studies have made significant progress
by focusing on the symptom-herb relationship in prescriptions. However, several limitations
hinder model performance: (i) Insufficient attention to patient-personalized information
such as age, BMI, and medical history, which hampers accurate identification of syndrome
and reduces efficacy. (ii) The typical long-tailed distribution of herb data introduces
training biases and affects generalization ability. (iii) The oversight of the 'monarch,
minister, assistant and envoy' compatibility among herbs increases the risk of toxicity
or side effects, opposing the 'treatment based on syndrome differentiation' principle
in clinical TCM. Therefore, we propose a novel hierarchical structure-enhanced personalized
recommendation model for TCM formulas based on knowledge graph (KG) diffusion guidance,
namely TCM-HEDPR. Specifically, we pre-train symptom representations using patient-personalized
prompt sequences and apply prompt-oriented contrastive learning (CL) for data augmentation.
Furthermore, we employ a KG-guided homogeneous graph diffusion method integrated with
a self-attention mechanism to globally capture the non-linear symptom-herb relationship.
Lastly, we design a heterogeneous graph hierarchical network to integrate herbal dispensing
relationships with implicit syndromes, guiding the prescription generation process
at a fine-grained level and mitigating the long-tailed herb data distribution problem.
Extensive experiments on two public datasets and one clinical dataset demonstrate
the effectiveness of TCM-HEDPR. In addition, we incorporate insights from modern medicine
and network pharmacology to evaluate the recommended prescriptions comprehensively.
It can provide a new paradigm for the recommendation of modern TCM.
HRCformer: Hierarchical Recursive Convolution-Transformer with Multi-Scale Adaptive
Recalibration for Time Series Forecasting
- Dejiang Zhang
- Lianyong Qi
- Yuwen Liu
- Xucheng Zhou
- Jianye Xie
- Haolong Xiang
- Xiaolong Xu
- Xuyun Zhang
- Yang Cao
- Yang Zhang
Time series forecasting has significant applications across various domains, including
industry, agriculture, and finance. Transformer-based models have shown significant
promise in enhancing time series forecasting over the past few years. However, existing
methods struggle to simultaneously capture local details and global semantics under
single-view architectures. They also find it difficult to dynamically adapt to time-varying
and multi-scale temporal patterns while accurately modeling the complex, time-varying
relationships between multiple variables. To address these challenges, we propose
HRCformer, a novel Transformer-based framework that introduces two key innovations:
the Hierarchical Recursive Interaction Convolution (HRIC) and the Triad Adaptive Recalibration
Module (TARM). HRIC achieves joint modeling of fine-grained short-term fluctuations
and high-order cross-period dependencies in time series by integrating Divide-and-Process
Convolution for local processing with Recursive Channel Interaction Convolution for
global processing. TARM further enhances dynamic modeling via Dynamic Variance Attention,
which amplifies critical temporal deviations through 3D attention, and the Adaptive
Multivariate Recalibration, which uses a two-layer fully connected network with nonlinear
activation to learn the dynamic relationships between channels, suppresses noise,
and emphasizes informative multivariate interactions. Comprehensive experiments conducted
on seven real-world datasets highlight the superiority of HRCformer compared to prior
state-of-the-art methods.
EvalAgent: Towards Evaluating News Recommender Systems with LLM-based Agents
- Guangping Zhang
- Peng Zhang
- Jiahao Liu
- Zhuoheng Li
- Dongsheng Li
- Hansu Gu
- Tun Lu
- Ning Gu
Online news platforms have become the primary source of information consumption, with
recommender systems serving as critical gateways that shape public discourse through
their algorithmic power, necessitating rigorous evaluation methodologies. Traditional
offline evaluation methods struggle with evolving user behavior and dynamic system
adaptation, while online experiments are costly, time-consuming, and ethically challenging.
To address these challenges, this paper introduces EvalAgent, a large language model
agent system for simulating real-world online news recommender systems. EvalAgent
employs Stable Memory (StM) to model users' exploration-exploitation dynamics, mitigating
noise from irrelevant interactions by analyzing the distribution density of news articles
within the short-term memory, and incrementally maintains the long-term memory to
capture users' high-level preferences, thereby enabling a consistent and reliable
simulation of sustained interactions. It further incorporates an Environment Interaction
Framework (EIF) to enable seamless engagement with real-world recommender systems.
This approach yields a precise, scalable, and ethically responsible evaluation framework
for news recommender systems. Comprehensive experiments and user studies substantiate
EvalAgent's efficacy, with publicly available code to support ongoing research in
recommender system evaluation.
SpeedSteiner: A Fast O(k1/2)-Approximation Algorithm for Directed Steiner Tree
- Guangyi Zhang
- Nikolaj Tatti
- Aristides Gionis
The directed Steiner tree problem is fundamental in computer science with numerous
applications. However, to date, there are no efficient algorithms with quality guarantees.
In this paper, we take on this challenge and offer a fast algorithm with provable
approximation guarantees. We introduce SpeedSteiner, a O(k1/2)-approximation algorithm, where k is the number of terminal nodes. In practice, SpeedSteiner can be several orders
of magnitude faster than other methods with a similar approximation ratio. The speedup
is achieved by combining several optimization techniques that exploit the inner structure
of recursive-greedy algorithms. We systematically evaluate the proposed algorithm
and verify its scalability and strong empirical performance.
Harnessing Commonsense: LLM-Driven Knowledge Integration for Fine-Grained Sentiment
Analysis
Fine-grained sentiment analysis, which aims to identify sentiments associated with
specific aspects within sentences, faces challenges in effectively incorporating commonsense
knowledge. Recent advancements leveraging large language models (LLMs) as data generators
show promise but are limited by the LLMs' lack of nuanced, domain-specific understanding
and pose a significant risk of data leakage during inference, potentially leading
to inflated performance metrics. To address these limitations, we propose LLM-Kit,
a novel framework for commonsense-enhanced fine-grained sentiment analysis that integrates
knowledge via LLM-guided graph construction, effectively mitigating data leakage risks.
LLM-Kit operates in two key stages: (1) Commonsense Graph Construction (CGC): We design
second-order rules and leverage LLMs for evaluation to ensure the accuracy of the
generated graph and mitigate the risk of data leakage from LLMs. (2) Knowledge-integration
Graph Representation Learning (KGRL): We extract knowledge that is aware of various aspects through Graph Representation
Learning (GRL). To capture the underlying semantic nuances within the input sentence,
we develop a Sentence Semantic Learning (SSL) module based on RoBERTa that explicitly
encodes internal semantics. This module provides complementary information to the
GCN, improving the model's ability to discern subtle sentiment variations related
to different aspects. Comprehensive experiments on three public datasets affirm that
LLM-Kit achieves comparable performance with state-of-the-art models.
SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models
- Kepu Zhang
- Weijie Yu
- Zhongxiang Sun
- Jun Xu
Syllogistic reasoning is a fundamental aspect of legal decision-making, enabling logical
conclusions by connecting general legal principles with specific case facts. Although
existing large language models (LLMs) can generate responses to legal questions, they
fail to perform explicit syllogistic reasoning, often producing implicit and unstructured
answers that lack explainability and trustworthiness. To address this limitation,
we propose SyLeR, a novel framework that empowers LLMs to engage in explicit syllogistic
legal reasoning. SyLeR integrates a tree-structured hierarchical retrieval mechanism
to effectively combine relevant legal statutes and precedent cases, forming comprehensive
major premises. This is followed by a two-stage fine-tuning process: supervised fine-tuning
warm-up establishes a foundational understanding of syllogistic reasoning, while reinforcement
learning with a structure-aware reward mechanism refines the model's ability to generate
diverse logically sound and well-structured reasoning paths. We conducted extensive
experiments across various dimensions, including in-domain and cross-domain user groups
(legal laypersons and practitioners), multiple languages (Chinese and French), and
different LLM backbones (legal-specific and open-domain LLMs). The results show that
SyLeR significantly improves response accuracy and consistently delivers explicit,
explainable, and trustworthy legal reasoning.
Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion
- Moyu Zhang
- Yongxiang Tang
- Yujun Jin
- Jinxin Hu
- Yu Zhang
Traditional recommendation methods model a user's interest in a target item by correlating
its embedding with the embeddings of items from the user's interaction history, thereby
capturing implicit collaborative filtering signals. Consequently, traditional ID-based
methods often encounter data sparsity problems stemming from the sparse nature of
ID features. To mitigate this issue, recommendation models incorporate multimodal
item information to enhance recommendation accuracy. However, existing multimodal
recommendation methods typically rely on early fusion approaches, which focus primarily
on combining text and image features, while neglecting the dynamic context provided
by user behavior sequences. This oversight precludes the dynamic adaptation of multimodal
interest representations to behavioral patterns, thereby hindering the model's ability
to effectively capture user multimodal interests. Therefore, this paper proposes the
Distribution-Guided Multimodal-Interest Auto-Encoder (DMAE), which achieves the cross fusion of user multimodal interest at the
behavioral level. Specifically, DMAE comprises three key components: 1) Multimodal
Interest Encoding Unit (MIEU), which encodes the similarity scores between the target
item and historically clicked items as the corresponding representation vectors of
user interest across different modalities. 2) Multimodal Interest Fusion Unit (MIFU),
which dynamically adapts these interest representations through both intra- and inter-modal
fusion, a process contextualized by the user's behavioral sequence to achieve a fine-grained
and behavior-aware representation of interest. 3) Interest-Distribution Decoding Unit
(IDDU), which employs a decoder to reconstruct the encoded user interest representations
into true similarity distributions for each modality. The similarity distributions
serve as a guide for model learning, aiming to retain as much multimodal information
as possible. Ultimately, extensive experiments demonstrate the superiority of DMAE.
HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning
- Qianru Zhang
- Xinyi Gao
- Haixin Wang
- Dong Huang
- Siu-Ming Yiu
- Hongzhi Yin
Spatial-temporal graph representations play a crucial role in urban sensing applications,
including traffic analysis, human mobility behavior modeling, and citywide crime prediction.
However, a key challenge lies in the noisy and sparse nature of spatial-temporal data,
which limits existing neural networks' ability to learn meaningful region representations
in the spatial-temporal graph. To overcome these limitations, we propose HGAurban,
a novel heterogeneous spatial-temporal graph masked autoencoder that leverages generative
self-supervised learning for robust urban data representation. Our framework introduces
a spatial-temporal heterogeneous graph encoder that extracts region-wise dependencies
from multi-source data, enabling comprehensive modeling of diverse spatial relationships.
Within our self-supervised learning paradigm, we implement a masked autoencoder that
jointly processes node features and graph structure. This approach automatically learns
heterogeneous spatial-temporal patterns across regions, significantly improving the
representation of dynamic temporal correlations. Comprehensive experiments across
multiple spatiotemporal mining tasks demonstrate that our framework outperforms state-of-the-art
methods and robustly handles real-world urban data challenges, including noise and
sparsity in both spatial and temporal dimensions.
GCoder: Improving Large Language Model for Generalized Graph Reasoning
- Qifan Zhang
- Xiaobin Hong
- Jianheng Tang
- Nuo Chen
- Yuhan Li
- Wenzhong Li
- Jing Tang
- Jia Li
Large Language Models (LLMs) have demonstrated remarkable progress across a variety
of reasoning tasks. Among these, graph-related tasks-which require the integration
of multiple reasoning paradigms and whose complexity increases with graph size-present
unique challenges for LLMs. Existing research primarily centers on chain-of-thought
(CoT) methods and code-based approaches. While CoT methods have shown promise in many
domains, they often underperform in graph reasoning due to unverifiable reasoning
steps, computational inaccuracies, and limited generalization capabilities. Code-based
approaches, which leverage LLMs to generate executable programs, offer an alternative
paradigm by offloading complex computations to external tools. However, current LLMs
still face challenges related to closed-source restrictions, deployment difficulties,
code quality, and generalization. To address these limitations, we propose GCoder,
a code-based LLM specifically designed to enhance performance in generalized graph
reasoning tasks. Our approach includes the construction of a comprehensive training
dataset, GraphWild, which encompasses a wide range of graph formats and algorithms.
We employ a multi-stage post-training process, incorporating Supervised Fine-Tuning
(SFT) and Reinforcement Learning from Compiler Feedback (RLCF), to further refine
the model's capabilities. For unseen tasks, a hybrid retrieval strategy is utilized
to boost performance. Experimental results demonstrate that GCoder outperforms GPT-4o,
achieving an average accuracy improvement of 13.29% across 13 different graph reasoning
problems. Additionally, GCoder efficiently handles large-scale graphs with millions
of nodes and diverse input formats. This advancement paves the way for more intuitive
and effective graph reasoning using LLMs. Our code is available at https://github.com/Bklight999/GCoder.
IPNet: An Interaction Pattern-aware Neural Network for Temporal Link Prediction
- Qingyang Zhang
- Yitong Wang
- Xinjie Lin
Temporal link prediction, which aims to predict the future status of edges between
target nodes, is vital for current prevalent online services. Most existing methods
ignore node-level behavior patterns, which play a decisive role in temporal link prediction,
as nodes that behave similarly are more likely to interact in the future. In this
paper, we propose a novel continuous-time model, the Interaction Pattern-aware neural
Network (IPNet), to capture node-level behavior patterns and network evolution by
encoding interaction sequences and contextual windows. We further devise a random
walk sampling strategy to enhance the extraction of these windows, preserving node-centric
structural evolution. Experimental results on seven real-world networks demonstrate
that IPNet outperforms state-of-the-art methods in both transductive and inductive
link prediction tasks. The code can be accessed via https://github.com/CoderZQY/IPNet.
Advancing Graph Isomorphism Tests with Metric Space Indicators: A Tool for Improving
Graph Learning Tasks
- Shenghui Zhang
- Pak Lon Ip
- Rongqin Chen
- Shunran Zhang
- Leong Hou U
To enhance the capability of Graph Neural Networks (GNNs) in judging graph isomorphism
and graph classification tasks, this paper introduces a metric space-based graph isomorphism
judgment method called the k-MSI test, which offers more topological information than
the k-WL test and demonstrates superior graph isomorphism judgment capabilities compared
to the k-WL test at the same complexity level. On the open test isomorphic dataset
BREC, our k-MSI test accuracy rate is more than 11% ahead of the other methods. Furthermore,
based on the k-MSI test, we propose a feature enhancement method Node Metric Indicator
(NMI) that supplies additional topological information of graphs for GNNs and presents
a novel GNN named Metric Space Indicators Graph Neural Network (MSIGNN). Experimental
results on a publicly available benchmark graph classification task indicate that
the NMI feature-based MSIGNN outperforms state-of-the-art methods on the BREC graph
isomorphism test dataset and achieves satisfactory performance on real-world datasets.
SarRec: Statistically-guaranteed Augmented Retrieval for Recommendation
- Tong Zhang
- Nitin Bisht
- Zihao Li
- Guandong Xu
- Xianzhi Wang
Recently, Large Language Models with Retrieval-Augmented Generation (RAG) have recently
emerged as a powerful paradigm for sequential recommendation. However, existing methods
typically retrieve items for each user without any principled mechanism for guaranteeing
the reliability of generated recommendations, limiting their trustworthiness. To address
this, we introduce SarRec : Statistically-guaranteed Augmented Retrieval for Recommendations, a framework that
uses a simple retrieval step to provide relevant context and delivers calibrated,
uncertainty-aware predictions with formal statistical guarantees. Specifically, SarRec
first constructs the user's context set, utilizing a lightweight differentiable retrieval
mechanism for identifying relevant context, and then calibrates the LLM's outputs
by adapting the conformal prediction mechanism. We further provide a theoretical analysis
that establishes an upper bound on the expected risk of recommendation performance
metrics. Extensive experiments on multiple datasets from different domains validate
the effectiveness of our framework.
ECLIPSE: Efficient Cross-Lingual Log Intelligence Parser with Semantic Entropy-Enhanced
LCS Algorithm
- Wei Zhang
- Xianfu Cheng
- Xiang Li
- Jian Yang
- Liying Zhang
- Xiangyuan Guan
- Zhoujun Li
Log parsing is essential in software engineering but is challenged by the immense complexity of log templates and diverse cross-platform and cross-lingual log
semantics and structures in industrial logs. We propose ECLIPSE, an Efficient Cross-platform and Cross-lingual Log Intelligent Parsing framework
with Semantic Entropy-Enhanced Longest Common Subsequence algorithm in industrial
Environments. ECLIPSE leverages large language models to extract log keywords and
maintains a dynamic dictionary mapping these keywords to log templates. When parsing,
it retrieves candidate templates based on the keywords and log length. We design an
algorithm named Semantic Entropy-Enhanced Longest Common Subsequence (Entropy-ELCS)
for identifying the best template, improving token-level accuracy by incorporating
information entropy and semantic elements into the longest common subsequence algorithm.
The dictionary is updated with new keywords and templates for continuous improvement.
Experiments on public benchmarks and our industrial log parsing benchmark ECLIPSE-BENCH
demonstrate that ECLIPSE achieves strong performance and superior efficiency, especially
when handling large template sets.
TopKNet:Learning to Perceive the Top-K Pivotal Nodes in Spatio-Temporal Data for Traffic
Forecasting
Traffic prediction is a crucial research area in spatio-temporal forecasting. The
key to traffic forecasting lies in the effective modelling of complex dependencies
in spatio-temporal graph as well as capturing spatio-temporal heterogeneity. The majority
of existing methodologies harness fully connected graphs derived from spatio-temporal
graph neural networks or transformer-based models to model intricate spatio-temporal
dependencies. However, it is paramount to recognize that not every node within this
spatio-temporal graph contributes equally to the modeling of such dependencies. Consequently,
the ability to discern and effectively leverage the significant nodes within the graph
holds the key to enhancing the accuracy of traffic prediction models. In this paper,
we center our attention on the pivotal nodes within the spatio-temporal graph and
propose a novel method called TopKNet for effective traffic forecasting. Specifically, we introduce Time-Aware TopK Attention and TopK GCN for pivotal nodes within the temporal and spatial dimensions respectively. Moreover,
Time-Aware Spatial Identity Embedding and Heterogeneity-Aware Loss are designed to characterise the spatio-temporal heterogeneity of nodes. Experiments
on six real-world traffic datasets verify our proposed method's effectiveness compared
to state-of-the-art baselines. These results offer fresh perspectives and insights
that can enrich the endeavors of subsequent researchers working on the design and
optimization of traffic models. The code will be made public at the following website
https://github.com/randomforest1111/TopKNet.
LLM4CD: Leveraging Large Language Models for Open-World Knowledge Augmented Cognitive
Diagnosis
- Weiming Zhang
- Lingyue Fu
- Qingyao Li
- Kounianhua Du
- Jianghao Lin
- Jingwei Yu
- Wei Xia
- Weinan Zhang
- Ruiming Tang
- Yong Yu
Cognitive diagnosis (CD) plays a crucial role in intelligent education, evaluating
students' comprehension of knowledge concepts based on their test histories. However,
current CD methods often model students, exercises, and knowledge concepts solely
on their ID relationships, neglecting the abundant semantic relationships present
within the educational data space. Furthermore, contemporary intelligent tutoring
systems (ITS) frequently involve the addition of new students and exercises, creating
cold-start scenarios that ID-based methods find challenging to manage effectively.
The advent of large language models (LLMs) offers the potential for overcoming this
challenge with open-world knowledge. In this paper, we propose LLM4CD, which Leverages Large Language Models for open-world knowledge Augmented Cognitive
Diagnosis. Our method utilizes the open-world knowledge of LLMs to construct cognitively
expressive textual representations, which are then encoded to introduce rich semantic
information into the CD task. Additionally, we propose an innovative bi-level encoder
framework that models students' test histories through two levels of encoders: a macro-level
cognitive text encoder and a micro-level knowledge state encoder. This approach substitutes
traditional ID embeddings with semantic representations, enabling the model to accommodate
new students and exercises with open-world knowledge and address the cold-start problem.
Extensive experimental results demonstrate that LLM4CD consistently outperforms previous
CD models on multiple real-world datasets, validating the effectiveness of leveraging
LLMs to introduce rich semantic information into the CD task.
FEDDGCN: A Frequency-Enhanced Decoupling Dynamic Graph Convolutional Network for Traffic
Flow Prediction
- Wendong Zhang
- Ruobai Xiang
- Zhifang Liao
- Peng Lan
- Qihao Liang
As a core task in Intelligent Transportation Systems (ITS), traffic flow prediction
is essential for resource allocation and real-time route planning. Effectively capturing
complex temporal correlations and dynamic spatial dependencies in traffic flow data
is critical yet challenging for accurate prediction. However, existing approaches
are still limited by the insufficient capability for spatial-temporal pattern decoupling
and the underutilization of frequency domain information. To address these issues,
we propose a novel Frequency-Enhanced Dynamic Decoupling Graph Convolutional Network
(FEDDGCN), which introduces a gated decoupling mechanism integrating temporal and
spatial embeddings to decouple traffic flow into prominent periodic and perturbative
component. It also achieves effective pattern separation by incorporating frequency
domain analysis with Fourier filters. Furthermore, a dual-branch spatial-temporal
learning module, employing a divide-and-conquer strategy, is designed to achieve separate
modeling for the two distinct components. Specially, the dynamic graph convolution
modules are utilized to learn spatial dependencies and temporal and frequency attention
mechanisms further capture complex temporal correlations for prominent periodic and
perturbative components.Extensive experiments on multiple real-world datasets demonstrate
that FEDDGCN achieves superior predictive performance compared with state-of-the-art
methods.
Tide: A Time-Wise Causal Debiasing Framework for Generative Dynamic Link Prediction
- Xin Zhang
- Jianming Zheng
- Fei Cai
- Zhiqiang Pan
- Wanyu Chen
- Chonghao Chen
- Honghui Chen
Dynamic link prediction aims to predict the future links in dynamic graphs. Existing
generative dynamic link prediction studies utilize the global degree distribution
for mitigating the over-estimation problem, which can model the time-invariant features
while neglecting the time-varying features, resulting in capturing inaccurate evolution
patterns. However, such time related features are intrinsically coupled, which makes
simultaneously and independently modeling both features infeasible. Motivated by these
issues, we propose a Time-wise causal debiasing framework (Tide) for generative dynamic link prediction, which does not resort to any extra trainable
modules. Instead, to obtain the time-invariant features, we first utilize a time-invariant
deconfounded learning mechanism for decoupling the prediction score with the degree
distribution. To leverage the time-varying features, we intervene in the model during
the inference stage by a predicted future degree distribution, aiming to make the
accurate predictions for dynamic graphs. Experiments conducted on four public datasets
under both inductive and transductive settings present that our Tide enhanced models
can outperform their corresponding vanilla versions by up to 21.42% and 27.73% in
terms of NDCG and Jaccard, respectively.
Hyperbolic Prompt Learning for Incremental Event Detection with LLMs
- Xiujin Zhang
- Wenxin Jin
- Haotian Hong
- Pengfei Zhang
- Jiting Li
- Kongjing Gu
- Hao Peng
- Li Sun
Class-incremental event detection (CIED) is essential for real-world information extraction
systems, which must continually recognize new event types without forgetting past
knowledge. The main challenge lies in balancing stability and adaptability under data
imbalance. Existing methods often underuse the hierarchical and syntactic structures
of language, and thus limit the generalization capacity. We propose HPLLM, a hyperbolic
prompt-enhanced large language model framework, motivated by the observation that
both embedding distributions and dependency graphs in event datasets exhibit hyperbolic
properties. HPLLM integrates two key components: (1) Hyperbolic LoRA fine-tuning,
enabling geometry-aware parameter adaptation for hierarchical semantics; and (2) Hyperbolic
Adaptive Graph Diffusion Convolution (HADC), which encodes syntactic dependencies
into structure-aware prompts for LLMs. Together, these techniques strengthen semantic
discrimination, reduce forgetting, and improve adaptation across incremental stages.
Extensive experiments on ACE2005 and MAVEN demonstrate that HPLLM consistently surpasses
state-of-the-art baselines in macro-F1, achieving stronger retention of old knowledge
and better generalization to new event types. In particular, the model shows clear
gains on rare categories with few training mentions, demonstrating its robustness
in imbalanced and few-shot regimes.
Traffic Safety Evaluation Based on Macroscopic Traffic Features in Road Tunnels
- Yupu Zhang
- Lei Jia
- Hao Miao
- Weizhu Qian
- Yan Zhao
- Kai Zheng
Traffic accidents are one of the leading causes of death in the world. As an important
part of the design of traffic roads, tunnels bring convenience but also have huge
safety risks. To monitor road safety in real time and give timely warnings for drivers
in tunnels, where the light is dark, the space is limited, and the signal is unstable,
we study the problem of traffic safety evaluation based on macroscopic traffic features
in road tunnels. In particular, we transform the problem into a four-classification
problem. To overcome the long collection cycle of traffic crash data, we use the time-to-collision
index as the standard for dividing safety levels of road sections in tunnels. To achieve
the goal of collecting data in real time under the environment constraints of tunnels,
we use macroscopic traffic features as input in our model. Specifically, we design
a deep learning model, where the lane block can extract the interaction information
of sequential road segments in the same lane, and the prediction block can integrate
the results of the individual prediction of each lane and the overall prediction.
An extensive emprical study with real data offers insight into the effectiveness and
efficiency of the proposed model.
TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text
Mutual Transformations
- Zheng Zhang
- Yuntong Hu
- Bo Pan
- Chen Ling
- Liang Zhao
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions,
enabling detailed representation of data and their relationships across a broad spectrum
of real-world scenarios. Despite the potential for deeper insights, existing TAG representation
learning primarily omit the semantic relationship among node texts, and mostly relies
on supervised methods, necessitating extensive labeled data and limiting applicability
across diverse contexts. This paper introduces a new self-supervised learning framework,
Text-Attributed-Graph Multi-View Alignment (TAGA), which overcomes these constraints
by integrating TAGs' structural and semantic dimensions. TAGA constructs two complementary
views: Text-of-Graph view, which organizes node texts into structured documents based
on graph topology, and the Graph-of-Text view, which converts textual nodes and connections
into graph data. By aligning representations from both views, TAGA captures joint
textual and structural information. In addition, a novel structure-preserving random
walk algorithm is proposed for efficient training on large-sized TAGs. Our framework
demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world
datasets.
Transferable Deep Clustering Model
Deep learning has shown remarkable success in the field of clustering recently. However,
how to transfer a trained clustering model on a source domain to a target domain by
leveraging the acquired knowledge to guide the clustering process remains challenging.
Existing deep clustering methods often lack generalizability to new domains because
they typically learn a group of fixed cluster centroids, which may not be optimal
for the new domain distributions. In this paper, we propose a novel transferable deep
clustering model that can automatically adapt the cluster centroids according to the
distribution of data samples. Rather than learning a fixed set of centroids, our approach
introduces a novel attention-based module that can adapt the centroids by measuring
their relationship with samples. In addition, we theoretically show that our model
is strictly more powerful than some classical clustering algorithms such as k-means
or Gaussian Mixture Model (GMM). Experimental results on both synthetic and real-world
datasets demonstrate the effectiveness and efficiency of our proposed transfer learning
framework, which significantly improves the performance on target domain and reduces
the computational cost.
A Privacy-preserving Spatial Dataset Joinable Search in Cloud
- Zhengkai Zhang
- Hua Dai
- Hao Zhou
- Mingfeng Jiang
- Pengyue Li
- Geng Yang
In the era of big data, the demand for spatial dataset search has become increasingly
urgent. Leveraging the powerful storage and computing capabilities of cloud platforms,
the cloud has become a common choice for deploying dataset search services. However,
under risks of untrusted cloud environment and malicious attacks, protecting the privacy
of sensitive location information during spatial dataset search becomes particularly
critical. This paper focuses on the problem of privacy-preserving spatial datasets
joinable search in cloud, which has not been addressed in existing research. We first
propose a grid-based joinable coverage distinction model to measure the joinability
of spatial datasets, and further present a baseline scheme (PDJDS). To further enhance
efficiency and reduce storage cost, we propose an optimized scheme (PDJDS+), which
constructs a coarse-grained grid-based inverted index to filter candidate datasets
and integrates a joinable coverage distinction check table to expedite the evaluation
of spatial dataset coverage distinction. Experiments conducted on three real-world
spatial data repositories demonstrate that our scheme achieves superior performance
in terms of search accuracy, efficiency, and storage cost.
Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language
Models
- Zhenliang Zhang
- Junzhe Zhang
- Xinyu Hu
- Huixuan Zhang
- Xiaojun Wan
Large language models (LLMs) have achieved remarkable success in various tasks, yet
they remain vulnerable to faithfulness hallucinations, where the output does not align
with the input. In this study, we investigate whether social bias contributes to these
hallucinations, a causal relationship that has not been explored. A key challenge
is controlling confounders within the context, which complicates the isolation of
causality between bias states and hallucinations. To address this, we utilize the
Structural Causal Model (SCM) to establish and validate the causality and design bias
interventions to control confounders. In addition, we develop the Bias Intervention
Dataset (BID), which includes various social biases, enabling precise measurement
of causal effects. Experiments on mainstream LLMs reveal that biases are significant
causes of faithfulness hallucinations, and the effect of each bias state differs in
direction. We further analyze the scope of these causal effects across various models,
specifically focusing on unfairness hallucinations, which are primarily targeted by
social bias, revealing the subtle yet significant causal effect of bias on hallucination
generation.
Yes is Harder than No: A Behavioral Study of Framing Effects in Large Language Models
Across Downstream Tasks
- Ziheng Zhang
- Weixin Zeng
- Jiuyang Tang
- Ji Wang
- Xiang Zhao
Framing effect is a well-known cognitive bias in which individuals' responses to the
same underlying question vary depending on how the question is phrased. Recent studies
suggest that large language models (LLMs) also exhibit framing effects, but existing
work has primarily replicated psychological experiments using hand-crafted prompts,
leaving their impact on practical downstream tasks underexplored. To fill in the gap,
in this paper, we conduct a systematic empirical investigation into framing effects
in LLMs across multiple real-world downstream tasks. We construct semantically equivalent
prompts with positive and negative framings and evaluate a wide range of LLMs under
these conditions. We uncover several behavioral regularities of framing effects in
LLMs, among which the most notable one is a consistent response asymmetry: LLMs find
answering ''yes'' harder than ''no''. That is, LLMs tend to issue affirmative responses
(i.e., ''yes'') only when they are highly confident, while they incline to answer
negatively (i.e., ''no'') under uncertainty. We interpret this asymmetry through the
lens of Error Management Theory (EMT), which posits that rational agents adopt risk-averse
strategies to minimize the more costly error. We empirically show that this behavior
is partially attributable to a statistical imbalance in the frequency of positive
versus negative framing cues in pretraining corpora. Furthermore, we demonstrate that
the framing-induced bias in LLMs can inform prompt engineering and active in-context
learning, i.e., using framing-sensitive samples as demonstrations can improve model
performance. Finally, we offer a preliminary strategy to mitigate the framing effect,
i.e., injecting debiasing instructions, which shows promise. In all, our work uncovers
a fundamental behavioral bias in LLMs and offers practical guidance for their reliable
deployment across downstream tasks.
Unbiased Reasoning for Knowledge-Intensive Tasks in Large Language Models via Conditional
Front-Door Adjustment
- Bo Zhao
- Yinghao Zhang
- Ziqi Xu
- Yongli Ren
- Xiuzhen Zhang
- Renqiang Luo
- Zaiwen Feng
- Feng Xia
Large Language Models (LLMs) have shown impressive capabilities in natural language
processing but still struggle to perform well on knowledge-intensive tasks that require
deep reasoning and the integration of external knowledge. Although methods such as
Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) have been proposed
to enhance LLMs with external knowledge, they still suffer from internal bias in LLMs,
which often leads to incorrect answers. In this paper, we propose a novel causal prompting
framework, Conditional Front-Door Prompting (CFD-Prompting), which enables the unbiased
estimation of the causal effect between the query and the answer, conditional on external
knowledge, while mitigating internal bias. By constructing counterfactual external
knowledge, our framework simulates how the query behaves under varying contexts, addressing
the challenge that the query is fixed and is not amenable to direct causal intervention.
Compared to the standard front-door adjustment, the conditional variant operates under
weaker assumptions, enhancing both robustness and generalisability of the reasoning
process. Extensive experiments across multiple LLMs and benchmark datasets demonstrate
that CFD-Prompting significantly outperforms existing baselines in both accuracy and
robustness.
Robust Heterogeneous GNNs via Semantic Attention and Contrastive Learning
- Chongjie Zhao
- Jinyan Wang
- Linlin Su
- Zeming Gan
- Ziyang Zhou
Heterogeneous Graph Neural Networks (HGNNs) have achieved significant success in various
graph-related applications. However, their vulnerability to adversarial attacks remains
insufficiently studied, particularly when malicious modifications are made to the
graph structure, such as the addition of redundant edges or the removal of key semantic
edges. These perturbations not only interfere with the aggregation of neighboring
information but also compromise the semantic integrity of meta-paths, leading to a
significant degradation in model performance. Although previous studies have preliminarily
revealed the impact of structural perturbations on HGNN performance, there has been
insufficient exploration into how to develop efficient defense mechanisms from both
structural and semantic perspectives. To address this, we propose a novel defense
framework that integrates a meta-path-guided semantic-aware attention mechanism. This
mechanism dynamically adjusts edge weights to suppress noisy connections and enhance
those that are structurally and semantically significant. Additionally, to compensate
for the expressive power of raw features, we introduce a contrastive learning strategy
that combines local and global structural augmentations to guide the model in learning
perturbation-invariant representations in a self-supervised manner. Extensive experiments
on multiple real-world heterogeneous graph datasets demonstrate that our proposed
method significantly improves the robustness of HGNNs against adversarial attacks,
showing both effectiveness and generalization capability.
Hybrid2: Distributed GNN Training System Enhanced by Dual-Hybrid for Sampling and
Loading
- Chu Zhao
- Shengjie Dong
- Yuhai Zhao
- Yuan Li
- Zhengkui Wang
- Xingwei Wang
Graph Neural Networks (GNNs) are the rising standard for graph tasks, yet their distributed
training in servers or computing power network remains challenging. Cross-machine
sampling and data loading often create bottlenecks, leading to inefficient resource
utilization. In this paper, we present Hybrid2, a distributed GNN training system that combines full-graph and mini-batch training
through a novel hybrid-batch training method. It also adopts hybrid feature extraction,
leveraging both local caching and remote access to improve feature retrieval efficiency.
The integration of these methods in Hybrid² results in a dual hybrid-gain effect.
First, it reduces sampling and loading overhead by pre-aggregating neighbors for each
target vertex, minimizing the layers to sample and load. Second, it accelerates data
loading by dynamically identifying and locally caching the most frequently accessed
vertices during training, maximizing memory efficiency. Experimental results demonstrate
that Hybrid² brings substantial performance improvements across key components of
distributed GNN training. Network communication overhead is reduced by up to tens
of times, while both sampling and loading achieve at least several-fold speedups.
These gains contribute to an overall training acceleration exceeding 20× compared
to DistDGL, all with comparable GPU memory usage and no loss in accuracy. Compared
to the state-of-the-art system, it achieves nearly 3× speedup while using fewer resources.
CEM: A Data-Efficient Method for Large Language Models to Continue Evolving From Mistakes
- Haokun Zhao
- Jinyi Han
- Jie Shi
- Chengyu Du
- Jiaqing Liang
- Yanghua Xiao
- Weikang Zhou
- Zeye Sun
- Fei Yu
Large Language Models (LLMs) achieve remarkable success, but their static nature leads
to inherent limitations and persistent mistakes in dynamic real-world scenarios. While
Continual Instruction Tuning (CIT) and Continual Pre-training (CPT) are primary continual
learning approaches, they struggle with scalable knowledge acquisition and maintaining
model capabilities. To address these, we propose the Continue Evolving from Mistakes (CEM) method, a novel and data-efficient framework for continuous LLM evolution. Inspired
by human learning, CEM establishes an iterative process: it efficiently collects targeted
CPT data by robustly identifying LLM mistakes and uncertainties (via an Ambiguity-Aware
Knowledge Collection (AAKC) algorithm), and employs a novel joint training paradigm
that leverages CIT and CPT to assimilate knowledge efficiently while maintaining existing
capabilities and mitigating catastrophic forgetting. Extensive experiments confirm
CEM's effectiveness, yielding substantial accuracy gains for multiple models, increasing
accuracy by up to 29.63%. Code and datasets are available on GitHub https://anonymous.4open.science/r/cem-BB25.
Antelope: Potent and Concealed Jailbreak Attack Strategy
- Xin Zhao
- Xiaojun Chen
- Haoyu Gao
Due to the remarkable generative potential of diffusion-based models, numerous researches
have investigated jailbreak attacks targeting these frameworks. A particularly concerning
threat within image models is the generation of Not-Safe-for-Work (NSFW) content.
Despite the implementation of security filters, numerous efforts continue to explore
ways to circumvent these safeguards. Current attack methodologies primarily encompass
adversarial prompt engineering or concept obfuscation, yet they frequently suffer
from slow search efficiency, conspicuous attack characteristics and poor alignment
with targets. To overcome these challenges, we propose Antelope, a more robust and
covert jailbreak attack strategy designed to expose security vulnerabilities inherent
in generative models. Specifically, Antelope leverages the confusion of sensitive
concepts with similar ones, facilitates searches in the semantically adjacent text
space of these related concepts and aligns them with the target imagery, thereby generating
sensitive images that are consistent with the target and capable of evading detection.
Besides, we successfully exploit the transferability of model-based attacks to penetrate
online black-box services. Experimental evaluations demonstrate that Antelope outperforms
existing baselines across multiple defensive mechanisms, underscoring its efficacy
and versatility.
Disclaimer: This paper contains unsafe imagery that might be offensive to some readers.
ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation
- Yicong Zhao
- Shisong Chen
- Jiacheng Zhang
- Zhixu Li
Recent advances in large language models (LLMs) have demonstrated impressive capabilities
in code-related tasks such as code generation and automated program repair. Despite
their promising performance, most existing approaches for code repair suffer from
high training costs or computationally expensive inference. Retrieval-augmented generation
(RAG), with its efficient in-context learning paradigm, offers a more scalable alternative.
However, conventional retrieval strategies, which are often based on holistic code-text
embeddings, fail to capture the structural intricacies of code, resulting in suboptimal
retrieval quality. To address the above limitations, we propose ReCode, a fine-grained
retrieval-augmented in-context learning framework designed for accurate and efficient
code repair. Specifically, ReCode introduces two key innovations: (1) an algorithm-aware retrieval strategy that narrows
the search space using preliminary algorithm type predictions; and (2) a modular dual-encoder
architecture that separately processes code and textual inputs, enabling fine-grained
semantic matching between input and retrieved contexts. Furthermore, we propose RACodeBench,
a new benchmark constructed from real-world user-submitted buggy code, which addresses
the limitations of synthetic benchmarks and supports realistic evaluation. Experimental
results on RACodeBench and competitive programming datasets demonstrate that ReCode
achieves higher repair accuracy with significantly reduced inference cost, highlighting
its practical value for real-world code repair scenarios.
FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection
- Yunfeng Zhao
- Yixin Liu
- Shiyuan Li
- Qingfeng Chen
- Yu Zheng
- Shirui Pan
Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority
within a graph, playing a crucial role in applications such as social networks and
e-commerce. Despite the current advancements in deep learning-based GAD, existing
approaches often suffer from high deployment costs and poor scalability due to their
complex and resource-intensive training processes. Surprisingly, our empirical findings
suggest that the training phase of deep GAD methods, commonly perceived as crucial,
may actually contribute less to anomaly detection performance than expected. Inspired
by this, we propose FreeGAD, a novel training-free yet effective GAD method. Specifically, it leverages an affinity-gated
residual encoder to generate anomaly-aware representations. Meanwhile, FreeGAD identifies
anchor nodes as pseudo-normal and anomalous guides, followed by calculating anomaly
scores through anchor-guided statistical deviations. Extensive experiments demonstrate
that FreeGAD achieves superior anomaly detection performance, efficiency, and scalability
on multiple benchmark datasets from diverse domains, without any training or iterative
optimization.
Adapting LLMs for Personalized Evaluation of Explanations for Recommendations: A Meta-Learning
Approach based on MAML
- Yurou Zhao
- Yingfei Zhang
- Quan Zhou
- Shuang Zhang
- Wei Lin
- Jiaxin Mao
Providing explanations to justify recommendations enhances user satisfaction and trust.
Despite significant research on explanation generation methods, evaluating their quality
remains a critical yet under-explored challenge. Although large language models (LLMs)
have been used for automated evaluation of explanations, existing approaches fail
to account for the highly personalized na- ture of explanation assessment, where user
judgments towards the same explanations vary significantly. To address this, we pro-
pose MAML+PEFT method that combines Model-Agnostic Meta- Learning (MAML) with LoRA-based
parameter-efficient tuning to adapt LLMs for personalized explanation evaluation.
Building on this, we introduce TSA-MAML (Task Similarity Aware MAML)+PEFT, which clusters
users based on their estimated optimal model param- eters and learns group-specific
meta models by leveraging implicit group distributions of user preferences. Experiments
on synthetic and human-annotated datasets demonstrate superior alignment of MAML-based
methods with human ratings in both generalization and few-shot adaptation settings.
Additionally, we examine the cor- relation of MAML-based LLM-simulated human ratings
with real online user behaviors on a large-scale recommendation platform, demonstrating
the practical utility of our methods for real-world explainable recommendation systems.
ClariLM: Enhancing Open-domain Clarification Ability for Large Language Models
- Ziliang Zhao
- Haonan Chen
- Shiren Song
- Jian Xie
- Zhicheng Dou
Active understanding and clarification of user intent is crucial for information-seeking
systems based on Large Language Models (LLMs), as it enhances search efficiency and
improves user experience for human-LLM interaction. While existing systems rely on
domain-specific resources to generate clarifying questions, they face challenges when
extended to open-domain scenarios due to the lack of human-LLM clarification data. In this paper, we propose ClariLM to synthesize large-scale clarification data and enhance the LLMs' clarification
capability. Specifically, we design two key stages to prepare data: first, given a
user question, the Clarification Facet Detection (CFD) stage employs a facet mining
model learned from human-LLM conversation logs to predict realistic potential clarification
candidates. Additionally, it incorporates direct predictions from powerful LLMs as
supplements to guarantee comprehensive facet coverage. While CFD ensures high recall
of facet candidates, the subsequent Optimal Facet Selection (OFS) stage synthesizes
a set of new questions and employs a reasoning model to annotate the optimal facet
for each question, which further improves the precision of ClariLM in clarification
necessity prediction and optimal facet selection. The collected data are then applied
for supervised fine-tuning, followed by constructing preference data for preference
optimization. Experiments on our custom test set and two public benchmarks demonstrate
that ClariLM significantly outperforms various baseline models across clarification
necessity, clarifying question quality, and GPT-4-based comparative evaluation.
FollowGPT: A Framework of Follow-up Question Generation for Large Language Models
via Conversation Log Mining
- Ziliang Zhao
- Shiren Song
- Zhicheng Dou
During interactions between users and Large Language Models (LLMs), users often engage
in multi-turn questioning. Understanding the user's potential follow-up intents and
generating follow-up question candidates for the user is crucial for enhancing their
experience with LLMs. Existing methods for follow-up question generation mainly rely
on hand-crafted rules, the internal knowledge of LLMs, or the integration of external
knowledge. However, these approaches fail to effectively leverage real-world user follow-up intents when interacting with LLMs, resulting in generated questions that do not meet the
needs of practical scenarios. In this paper, we propose FollowGPT, a model that mines user follow-up intents from user-LLM conversational logs. However, directly introducing raw conversation logs leads to significant noise and
sparsity issues. Therefore, to address noises, FollowGPT adopts a hierarchical filtering strategy for data cleaning. To mitigate
the sparsity issue, FollowGPT employs data synthesis methods to augment the log data across three
dimensions: topic diversity, intent transition diversity, and negative sample diversity.
The processed data is then consolidated into a new dataset named ShareFQG for both training and evaluation. Finally, we train FollowGPT using a two-stage training
framework involving supervised fine-tuning and preference optimization. In our experiments,
we evaluate on both the ShareFQG test set and a publicly available dataset, FollowupQG,
using both automated metrics and GPT-4o-based comparative evaluation. The experimental
results demonstrate that our method outperforms existing baselines in various metrics,
including lexical similarity, semantic similarity, and GPT-4-based evaluation for
follow-up question generation, demonstrating FollowGPT's effectiveness.
Autonomous Reasoning-Retrieval for Large Language Model Based Recommendation
- Bowen Zheng
- Xiaolei Wang
- Enze Liu
- Xi Wang
- Hongyu Lu
- Yu Chen
- Wayne Xin Zhao
- Ji-Rong Wen
Recently, large language models (LLMs) have been introduced into recommender systems
(RSs) as recommendation backbones or to enhance traditional recommendation models
(TRMs). However, existing LLM-based RSs fail to fully leverage the complementary strengths
of LLMs (e.g., world knowledge and reasoning capabilities) and TRMs (e.g., recommendation-specific
knowledge and computational efficiency), resulting in shallow exploration of the item
space. To address this limitation, we propose DeepRec, a novel LLM-based RS approach that facilitates autonomous multi-turn interactions between LLMs and TRMs for deep item space exploration. In each interaction turn, LLMs reason over user preferences
and collaborate with TRMs to retrieve candidate items. After multi-turn interaction,
LLMs rank the aggregated candidates to generate the final recommendations. We utilize
reinforcement learning (RL) for optimization and introduce novel contributions in
three key aspects: recommendation model based data rollout, recommendation-oriented
hierarchical rewards, and a two-stage RL training strategy. For data rollout, we design
a preference-aware TRM, with which LLMs interact to construct trajectory data. For
reward design, we propose a hierarchical reward function that comprises both process-level
and outcome-level rewards to optimize the interaction process and recommendation quality,
respectively. For RL training, our two-stage RL strategy first guides LLMs to learn
effective interactions with TRMs, followed by recommendation-oriented RL for performance
enhancement. Experiments on public datasets show that DeepRec substantially outperforms
both traditional and existing LLM-based baselines, establishing a new paradigm for
deep exploration in recommender systems.
Adaptive Spline Networks in the Kolmogorov-Arnold Framework: Knot Analysis and Stability
Enhancement
- Liangwei Nathan Zheng
- Wei Emma Zhang
- Lin Yue
- Miao Xu
- Olaf Maennel
- Weitong Chen
Kolmogorov-Arnold Neural Networks (KANs) have recently attracted significant attention
in the machine learning community. However, their practical implementation often faces
challenges such as poor training stability and a large number of trainable parameters.
Moreover, the behavior of learnable activation functions based on B-splines remains
insufficiently understood. In this work, we analyze KANs through the lens of spline
knot behavior and derive lower and upper bounds on the number of knots in B-spline-based
KANs. To address the existing limitations, we propose a novel KAN-based approach,
which improves upon the original KAN by reducing the number of trainable parameters
to match the scale of standard Multi-Layer Perceptrons (MLPs), while enhancing overall
performance. Additionally, we introduce a new training strategy that enforces C2 continuity in the learnable splines, leading to smoother activation functions and
improved training stability via range expansion. We evaluate our method across eight
diverse datasets encompassing image, text, time series, multimodal, and function approximation
tasks. The promising results demonstrate the feasibility of KAN-based architectures
and the effectiveness of our proposed enhancements. The proposed method implementation
is released in https://github.com/IcurasLW/FR-KAN.git
Modeling Edge-Specific Node Features through Co-Representation Neural Hypergraph Diffusion
- Yijia Zheng
- Marcel Worring
Hypergraphs are widely being employed to represent complex higher-order relations
in real-world applications. Most existing research on hypergraph learning focuses
on node-level or edge-level tasks. A practically relevant and more challenging task,
edge-dependent node classification (ENC), is still under-explored. In ENC, a node
can have different labels across different hyperedges, which requires the modeling
of node features unique to each hyperedge. The state-of-the-art ENC solution, WHATsNet,
only outputs single node and edge representations, leading to the limitations of entangled
edge-specific features and non-adaptive representation sizes when applied to ENC.
Additionally, WHATsNet suffers from the common oversmoothing issue in most HGNNs.
To address these limitations, we propose CoNHD, a novel HGNN architecture specifically
designed to model edge-specific features for ENC. Instead of learning separate representations
for nodes and edges, CoNHD reformulates within-edge and within-node interactions as
a hypergraph diffusion process over node-edge co-representations. We develop a neural
implementation of the proposed diffusion process, leveraging equivariant networks
as diffusion operators to effectively learn the diffusion dynamics from data. Extensive
experiments demonstrate that CoNHD achieves the best performance across all benchmark
ENC datasets and several downstream tasks without sacrificing efficiency. Our implementation
is available at https://github.com/zhengyijia/CoNHD.
MI4Rec: Pretrained Language Model based Cold-Start Recommendation with Meta-Item Embeddings
- Zaiyi Zheng
- Yaochen Zhu
- Haochen Liu
- Mingxuan Ju
- Tong Zhao
- Neil Shah
- Jundong Li
Recently, pretrained large language models (LLMs) have been widely adopted in recommendation
systems to leverage their textual understanding and reasoning abilities to model user
behaviors and suggest future items. A key challenge in this setting is that items
on most platforms are not included in the LLM's training data. Therefore, existing
methods often fine-tune LLMs by introducing auxiliary item tokens to capture item
semantics. However, in real-world applications such as e-commerce and short video
platforms, the item space evolves rapidly, which gives rise to a cold-start setting,
where many newly introduced items receive little or even no user engagement. This
poses challenges in both learning accurate item token embeddings and generalizing
efficiently to accommodate the continual influx of new items. In this work, we propose
a novel meta-item token learning strategy to address both these challenges simultaneously. Specifically,
we introduce MI4Rec, an LLM-based approach for recommendation that uses just a few
learnable meta-item tokens and an LLM encoder to dynamically aggregate meta-items
based on item content. We show that this paradigm allows highly efficient and accurate
learning in such challenging settings. Extensive experiments on Yelp and Amazon reviews
datasets demonstrate the effectiveness of MI4Rec in both warm-start and cold-start
recommendations. Notably, MI4Rec achieves an average performance improvement of 20.4%
in Recall and NDCG compared to the best-performing baselines. The implementation of
MI4Rec is available at https://github.com/zhengzaiyi/MI4Rec
EvoFormer: Learning Dynamic Graph-Level Representations with Structural and Temporal
Bias Correction
- Haodi Zhong
- Liuxin Zou
- Di Wang
- Bo Wan
- Zhenxing Niu
- Quan Wang
Dynamic graph-level embedding aims to capture structural evolution in networks, which
is essential for modeling real-world scenarios. However, existing methods face two
critical yet under-explored issues: Structural Visit Bias, where random walk sampling
disproportionately emphasizes high-degree nodes, leading to redundant and noisy structural
representations; and Abrupt Evolution Blindness, the failure to effectively detect
sudden structural changes due to rigid or overly simplistic temporal modeling strategies,
resulting in inconsistent temporal embeddings. To overcome these challenges, we propose
EvoFormer, an evolution-aware Transformer framework tailored for dynamic graph-level
representation learning. To mitigate Structural Visit Bias, EvoFormer introduces a
Structure-Aware Transformer Module that incorporates positional encoding based on
node structural roles, allowing the model to globally differentiate and accurately
represent node structures. To overcome Abrupt Evolution Blindness, EvoFormer employs
an Evolution-Sensitive Temporal Module, which explicitly models temporal evolution
through a sequential three-step strategy: (I) Random Walk Timestamp Classification,
generating initial timestamp-aware graph-level embeddings; (II) Graph-Level Temporal
Segmentation, partitioning the graph stream into segments reflecting structurally
coherent periods; and (III) Segment-Aware Temporal Self-Attention combined with an
Edge Evolution Prediction task, enabling the model to precisely capture segment boundaries
and perceive structural evolution trends, effectively adapting to rapid temporal shifts.
Extensive evaluations on five benchmark datasets confirm that EvoFormer achieves state-of-the-art
performance in graph similarity ranking, temporal anomaly detection, and temporal
segmentation tasks, validating its effectiveness in correcting structural and temporal
biases. Code is available at https://github.com/zlx0823/EvoFormerCode.
Budget and Frequency Controlled Cost-Aware Model Extraction Attack on Sequential Recommenders
- Lei Zhou
- Min Gao
- Zongwei Wang
- Yibing Bai
Sequential recommenders are integral to many applications yet remain vulnerable to
model extraction attacks, in which adversaries can recover information about the deployed
model by issuing queries to a black-box without internal access. From the attacker's
perspective, existing studies impose a fixed and limited query budget but overlook
optimal allocation, resulting in redundant or low-value requests. Furthermore, the
scarce data obtained through these costly queries is typically handled by crude random
sampling, resulting in low diversity and information coverage with actual data. In
this paper, we propose a novel approach, named Budget and Frequency Controlled Cost-Aware
Model Extraction Attack (BECOME), for extracting black-box sequential recommenders,
which extends the standard extraction framework with two cost-aware innovations: Feedback-Driven
Dynamic Budgeting periodically evaluates the victim model to refine query allocation
and steer sequence generation adaptively. Rank-Aware Frequency Controlling integrates
frequency constraints with ranking guidance in the next-item sampler to select high-value
items and broaden information coverage. Experiments on public datasets and representative
sequential recommender architectures demonstrate that our method achieves superior
extraction performance. Our code is released at https://github.com/Loche2/BECOME.
Enhancing Dual-Target Cross-Domain Recommendation via Similar User Bridging
- Qi Zhou
- Xi Chen
- Chuyu Fang
- Jianji Wang
- Chuan Qin
- Fuzhen Zhuang
Dual-target cross-domain recommendation aims to mitigate data sparsity and enables
mutual enhancement via bidirectional knowledge transfer. Most existing methods rely
on overlapping users to build cross-domain connections. However, in many real-world
scenarios, overlapping data is extremely limited-or even entirely absent-significantly
diminishing the effectiveness of these methods. To address this challenge, we propose
SUBCDR, a novel framework that leverages large language models (LLMs) to bridge similar
users across domains, thereby enhancing dual-target cross-domain recommendation. Specifically,
we introduce a Multi-Interests-Aware Prompt Learning mechanism that enables LLMs to
generate comprehensive user profiles, disentangling domain-invariant interest points
while capturing fine-grained preferences. Then, we construct intra-domain bipartite
graphs from user-item interactions and an inter-domain heterogeneous graph that links
similar users across domains. Subsequently, to facilitate effective knowledge transfer,
we employ Graph Convolutional Networks (GCNs) for intra-domain relationship modeling
and design an Inter-domain Hierarchical Attention Network (InterHAN) to facilitate
inter-domain knowledge transfer through similar users, learning both shared and specific
user representations. Extensive experiments on seven public datasets demonstrate that
SUBCDR outperforms state-of-the-art cross-domain recommendation algorithms and single-domain
recommendation methods. Our code is publicly available at https://github.com/97z/SUBCDR.git.
BALM-TSF: Balanced Multimodal Alignment for LLM-Based Time Series Forecasting
- Shiqiao Zhou
- Holger Schöner
- Huanbo Lyu
- Edouard Fouché
- Shuo Wang
Time series forecasting is a long-standing and highly challenging research topic.
Recently, driven by the rise of large language models (LLMs), research has increasingly
shifted from purely time series methods toward harnessing textual modalities to enhance
forecasting performance. However, the vast discrepancy between text and temporal data
often leads current multimodal architectures to over-emphasise one modality while
neglecting the other, resulting in information loss that harms forecasting performance.
To address this modality imbalance, we introduce BALM-TSF (Balanced Multimodal Alignment
for LLM-Based Time Series Forecasting), a lightweight time series forecasting framework
that maintains balance between the two modalities. Specifically, raw time series are
processed by the time series encoder, while descriptive statistics of raw time series
are fed to an LLM with learnable prompt, producing compact textual embeddings. To
ensure balanced cross-modal context alignment of time series and textual embeddings,
a simple yet effective scaling strategy combined with a contrastive objective then
maps these textual embeddings into the latent space of the time series embeddings.
Finally, the aligned textual semantic embeddings and time series embeddings are together
integrated for forecasting. Extensive experiments on standard benchmarks show that,
with minimal trainable parameters, BALM-TSF achieves state-of-the-art performance
in both long-term and few-shot forecasting, confirming its ability to harness complementary
information from text and time series. Code is available at https://github.com/ShiqiaoZhou/BALM-TSF.
Calibrated and Diverse News Coverage
- Tianyi Zhou
- Stefan Neumann
- Kiran Garimella
- Aristides Gionis
In recent years, there has been a debate about whether automated news aggregators,
like Google News, lead readers to content that reinforces their existing beliefs and
restricts their exposure to a biased subset of perspectives. To avoid bias, it has
become common practice that news aggregators provide articles based on source diversity:
for each story, they pick articles from news sources with different political leanings.
In this paper, we ask whether this practice is sufficient. In particular, we study
how well the diversity of viewpoints, in particular with respect to entities, is covered
by articles picked using plain source diversity. We analyze a dataset fetched from
Google News and find that, even though the top articles exhibit some diversity with
respect to the leanings of the news outlets, many possible viewpoints towards the
entities are missing. Based on this observation we design novel methods for selecting
a small set of articles that cover all possible viewpoints; to ensure that our selections
are useful we show how to incorporate the user preferences into our model. Our experiments
on four real-world datasets show that our algorithms cover significantly more different
viewpoints than previous baselines.
DebiasedKGE: Towards Mitigating Spurious Forgetting in Continual Knowledge Graph Embedding
- Junlin Zhu
- Bo Fu
- Guiduo Duan
To maintain an effective memory of old knowledge in a dynamically growing knowledge
environment, continual knowledge graph embedding (CKGE) focuses on alleviating catastrophic
forgetting. However, existing CKGE methods still suffer substantial performance degradation
in dynamic knowledge graphs (DKG). We have found this challenge is mainly posed by
spurious forgetting, a previously overlooked phenomenon that arises from the inherent
interference effects in the continual learning (CL) process. In this paper, we deeply
explore spurious forgetting in CKGE. First, we reveal two primary causes of spurious
forgetting, knowledge interference and knowledge misalignment, and how to affect knowledge
biasing within dynamic learning scenarios. Second, to fill this research gap, we propose
a robust and efficient CKGE method (DebiasedKGE) for mitigating spurious forgetting.
Specifically, to alleviate knowledge interference, we propose a mutual information-guided
disentangled learning mechanism, which identifies latent features of different knowledge
types and learns independent semantic representations for each, thereby reducing interference
in knowledge embedding. Furthermore, to mitigate the deviation of new knowledge from
previously learned knowledge, we design a dual-view regularized knowledge alignment
mechanism that jointly constrains both the magnitude and direction of embedding transitions.
Finally, we evaluate DebiasedKGE on four public CKGE datasets and two additional datasets
constructed to contain knowledge perturbations of different dimensions. The results
show that DebiasedKGE effectively alleviates spurious forgetting and achieves significant
performance improvements. Our codes and datasets are available at https://anonymous.4open.science/r/DebiasedKGE.
LatentExplainer: Explaining Latent Representations in Deep Generative Models with
Multimodal Large Language Models
- Mengdan Zhu
- Raasikh Kanjiani
- Jiahui Lu
- Andrew Choi
- Qirui Ye
- Liang Zhao
Deep generative models like VAEs and diffusion models have advanced various generation
tasks by leveraging latent variables to learn data distributions and generate high-quality
samples. Despite the field of explainable AI making strides in interpreting machine
learning models, understanding latent variables in generative models remains challenging.
This paper introduces LatentExplainer, a framework for automatically generating semantically
meaningful explanations of latent variables in deep generative models. LatentExplainer
tackles three main challenges: inferring the meaning of latent variables, aligning
explanations with inductive biases, and handling varying degrees of explainability.
Our approach perturbs latent variables, interprets changes in generated data, and
uses multimodal large language models (MLLMs) to produce human-understandable explanations.
We evaluate our proposed method on several real-world and synthetic datasets, and
the results demonstrate superior performance in generating high-quality explanations
for latent variables. The results highlight the effectiveness of incorporating inductive
biases and uncertainty quantification, significantly enhancing model interpretability.
FinCast: A Foundation Model for Financial Time-Series Forecasting
- Zhuohang Zhu
- Haodong Chen
- Qiang Qu
- Vera Chung
Financial time-series forecasting is critical for maintaining economic stability,
guiding informed policymaking, and promoting sustainable investment practices. However,
it remains challenging due to various underlying pattern shifts. These shifts arise
primarily from three sources: temporal non-stationarity (distribution changes over
time), multi-domain diversity (distinct patterns across financial domains such as
stocks, commodities, and futures), and varying temporal resolutions (patterns differing
across per-second, hourly, daily, or weekly indicators). While recent deep learning
methods attempt to address these complexities, they frequently suffer from overfitting
and typically require extensive domain-specific fine-tuning. To overcome these limitations,
we introduce FinCast, the first foundation model specifically designed for financial
time-series forecasting, trained on large-scale financial datasets. Remarkably, FinCast
exhibits robust zero-shot performance, effectively capturing diverse patterns without
domain-specific fine-tuning. Comprehensive empirical and qualitative evaluations demonstrate
that FinCast surpasses existing state-of-the-art methods, highlighting its strong
generalization capabilities.
FunLoc: A Novel Function-level Bug Localization Framework Enhanced by Contrastive
and Active Learning Strategies
- Ziye Zhu
- Liangliang Peng
- Yu Wang
- Yun Li
- Xianzhong Long
The increasing complexity of software systems has made them more prone to bugs, prompting
the development of automated bug localization techniques to ensure software reliability.
Despite these techniques having demonstrated notable success at the file level, their
application and optimization at the function level often encounter serious performance
cliffs. This limitation underscores the urgent need for a dedicated framework for
function-level bug localization, which we address through FunLoc, a novel framework that takes coarse-grained source files as input units and identifies
fine-grained buggy functions as output. To address the critical challenges of handling
domain-specific bug reports and managing vast function-level sample space, we introduce two key innovations that are seamlessly integrated into FunLoc. First, we design a contrastive learning-based domain-adaptive language model to
enhance the framework's ability to process and interpret specialized bug reports effectively.
Second, we propose an active learning-based dynamic negative sampling strategy to
address the scalability issues arising from the extensive function-level sample space.
To evaluate the effectiveness of our approach, we extend and release a function-level
bug localization dataset derived from large-scale real-world projects. Extensive experiments
demonstrate that our approach outperforms state-of-the-art techniques.
MGSTDN: Multi-Granularity Spatial-Temporal Diffusion Network for Next POI Recommendation
- Zhuang Zhuang
- Haitao Yuan
- Shanshan Feng
- Heng Qi
- Yanming Shen
- Baocai Yin
Next Point-of-Interest (POI) prediction is important to various human mobility applications,
such as route planning and location-based advertising. To address the spatial-temporal
sparsity issues arising from users' irregular and inconsistent visit times to different
POIs, multi-granular structures can be incorporated to enhance feature representation
through hierarchical relationships. However, existing methods often fall short in
capturing the comprehensive multi-granularity spatial-temporal correlations due to
three primary limitations: (1) users' complex mobility patterns entangled in single
trajectory data, (2) limited mobility patterns details due to independent modeling
at each granularity, and (3) low inference efficiency in cascaded multi-granularity
predictions. To tackle these challenges, we propose a novel approach that models transformations
across different granularities in both spatial regions and temporal periods as a diffusion
process, leading to the development of the Multi-Granularity Spatial-Temporal Diffusion
Network (MGSTDN). In particular, this model adopts a multi-task architecture, where
predictions at varying spatial-temporal granularities (i.e., different diffusion steps)
are treated as distinct tasks. By employing a multi-granularity diffusion mechanism
in both spatial and temporal dimensions, it captures more nuanced spatial-temporal
correlations, enhancing the physical constraints and behavioral pattern dependencies
across granularities. During the diffusion process's forward stage, coarser-grained
regions and periods are derived based on fine-grained features. In the reverse stage,
finer-grained regions and periods are recovered from coarse-grained features, guided
by encoded historical trajectory information, until the next POI is determined. To
improve computational efficiency, we introduce a multi-granularity mapping propagation
matrix, enabling parallel computation and accelerating the prediction process across
different granularities. We evaluated the effectiveness of MGSTDN through extensive
experiments on three datasets, demonstrating significant improvements over existing
methods.
Frequency-Decoupled Distillation for Efficient Multimodal Recommendation
- Ziyi Zhuang
- Hongji Li
- Junchen Fu
- Jiacheng Liu
- Joemon M. Jose
- Youhua Li
- Yongxin Ni
Multimodal recommender systems (MMRec) leverage multimodal features, such as visual
and textual data, to improve recommendation performance, playing a key role in platforms
like online shopping and short videos. However, the large modality encoders and complex
processing modules of MMRec significantly reduce its efficiency. A promising solution
is compressing MMRec into an ID-based MLP model (MLPRec), which has a simpler structure
and avoids complex modality handling. However, traditional knowledge distillation
methods struggle to transfer knowledge effectively from MMRec to MLPRec, due to differences
in their model structure and capacity. To address this, we propose a frequency-decoupled
knowledge distillation framework-FDRec-to efficiently transfer knowledge from MMRec
to MLPRec. By analyzing graph signals from a signal processing perspective, we propose
decoupling the distillation process into low-frequency and high-frequency components,
ensuring effective transmission of challenging high-frequency knowledge while preventing
it from being overshadowed by monotonous low-frequency signals. To address the instability
and fragmentation issues of KL divergence in traditional distillation approaches,
we introduce the Wasserstein distance, which captures geometric structure and provides
stable gradients. Additionally, FDRec incorporates an embedding-level contrastive
learning method, further enhancing the transfer of refined knowledge from MMRec and
injecting graph structure information into MLPRec for more effective distillation.
Extensive experiments on four benchmark datasets and five popular MMRec models show
that FDRec not only significantly reduces the computational costs and improves the
inference efficiency, but also achieves comparable or even superior performance compared
to MMRec. Our code is available at: https://github.com/Suehn/FDRec_
Vulnerability-Aware Hardening for Secure Privacy-Preserving Record Linkage
- Sumayya Ziyad
- Peter Christen
- Anushka Vidanage
- Charini Nanayakkara
- Rainer Schnell
Privacy-Preserving Record Linkage (PPRL) aims to link records across multiple data
sources without revealing any sensitive information about the entities whose records
are being linked. However, recent studies have identified attacks that exploit multiple
vulnerabilities in popular PPRL methods. To address such vulnerabilities and prevent
possible reidentification, hardening techniques have been proposed to perturb patterns
in encodings. Most such hardening techniques are either specific to bit array based
encodings (such as Bloom filters), or they rely on randomness which can negatively
affect linkage quality. Here we propose a novel hardening technique that addresses
the frequency, similarity, and co-occurrence vulnerabilities, and is applicable on
any PPRL method that uses character q-grams. Our technique identifies and hardens
only those q-grams that are vulnerable, and modifies them using a non-random, context-aware
approach that ensures these q-grams are not vulnerable after hardening. We evaluate
our technique using real and synthetic data sets, and show that it substantially reduces
the vulnerabilities of PPRL encoding methods and makes them more secure.
Relational Multi-Path Enhancement for Extrapolative Relation Reasoning in Temporal
Knowledge Graph
- Linlin Zong
- Chi Ma
- Jiahui Zhou
- Xinyue Liu
- Wenxin Liang
- Xianchao Zhang
- Bo Xu
Relation reasoning in temporal knowledge graph infers unknown or emerging relational
dependencies from historical structured data. Traditional approaches face inherent
limitations in capturing complex semantic correlations and structural patterns among
relations. To tackle this problem, we propose the Relational Multi-path Enhancement
network (RME), which primarily focuses on relation modeling to enrich relation representations
through comprehensive multi-path analysis. RME consists of five key components: (1)
Controlled random walk module creates multi-hop head-to-tail paths using an adaptive
stopping rule that balances short- and long-term connections. (2) Shared path extraction
module identifies both shared-head paths and shared-tail paths. (3) Time-decayed path
encoding module processes these paths differently. (4) Gated information aggregation
module combines path information to determine which parts matter most. (5) Attention
decoding module makes the final prediction by focusing on the most relevant path features.
Experiments on multiple TKG benchmark datasets demonstrate that RME outperforms the
state-of-the-art methods in relation multi-path reasoning.
SESSION: Short Research Papers
Explicit Path CGR: Maintaining Sequence Fidelity in Geometric Representations
We present a novel information-preserving Chaos Game Representation (CGR) method,
also called Reverse-CGR (R-CGR), for biological sequence analysis that addresses the
fundamental limitation of traditional CGR approaches - the loss of sequence information
during geometric mapping. Our method introduces complete sequence recovery through
explicit path encoding combined with rational arithmetic precision control, enabling
perfect sequence reconstruction from stored geometric traces. Unlike purely geometric
approaches, our reversibility is achieved through comprehensive path storage that
maintains both positional and character information at each step. We demonstrate the
effectiveness of R-CGR on biological sequence classification tasks, achieving competitive
performance compared to traditional sequence-based methods while providing interpretable
geometric visualizations. The approach generates feature-rich images suitable for
deep learning while maintaining complete sequence information through explicit encoding,
opening new avenues for interpretable bioinformatics analysis where both accuracy
and sequence recovery are essential.
Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks
- Havva Alizadeh Noughabi
- Julien Serbanescu
- Fattane Zarrinkalam
- Ali Dehghantanha
Despite recent advances, Large Language Models (LLMs) remain vulnerable to jailbreak
attacks that bypass alignment safeguards and elicit harmful outputs. While prior research
has proposed various attack strategies differing in human readability and transferability,
little attention has been paid to the linguistic and psychological mechanisms that
may influence a model's susceptibility to such attacks. In this paper, we examine
an interdisciplinary line of research that leverages foundational theories of persuasion
from the social sciences to craft adversarial prompts capable of circumventing alignment
constraints in LLMs. Drawing on well-established persuasive strategies, we hypothesize
that LLMs, having been trained on large-scale human-generated text, may respond more
compliantly to prompts with persuasive structures. Furthermore, we investigate whether
LLMs themselves exhibit distinct persuasive fingerprints that emerge in their jailbreak
responses. Empirical evaluations across multiple aligned LLMs reveal that persuasion-aware
prompts significantly bypass safeguards, demonstrating their potential to induce jailbreak
behaviors. This work underscores the importance of cross-disciplinary insight in addressing
the evolving challenges of LLM safety. The code and data are available. https://github.com/CyberScienceLab/Our-Papers/tree/main/PersuasiveJailbreaking/.
Compressed Concatenation of Small Embedding Models
- Ben Ayad Mohamed Ayoub
- Michael Dinzinger
- Kanishka Ghosh Dastidar
- Jelena Mitrović
- Michael Granitzer
Embedding models are central to dense retrieval, semantic search, and recommendation
systems, but their size often makes them impractical to deploy in resource-constrained
environments such as browsers or edge devices. While smaller embedding models offer
practical advantages, they typically underperform compared to their larger counterparts.
To bridge this gap, we demonstrate that concatenating the raw embedding vectors of
multiple small models can outperform a single larger baseline on standard retrieval
benchmarks. To overcome the resulting high dimensionality of naive concatenation,
we introduce a lightweight unified decoder trained with a Matryoshka Representation
Learning (MRL) loss. This decoder maps the high-dimensional joint representation to
a low-dimensional space, preserving most of the original performance without fine-tuning
the base models. We also show that while concatenating more base models yields diminishing
gains, the robustness of the decoder's representation under compression and quantization
improves. Our experiments show that, on a subset of MTEB retrieval tasks, our concat-encode-quantize
pipeline recovers 89% of the original performance with a 48× compression factor when
the pipeline is applied to a concatenation of four small embedding models.
Enhancing Fake News Video Detection via LLM-Driven Creative Process Simulation
- Yuyan Bu
- Qiang Sheng
- Juan Cao
- Shaofei Wang
- Peng Qi
- Yuhui Shi
- Beizhe Hu
The emergence of fake news on short video platforms has become a new significant societal
concern, necessitating automatic video-news-specific detection. Current detectors
primarily rely on pattern-based features to separate fake news videos from real ones.
However, limited and less diversified training data lead to biased patterns and hinder
their performance. This weakness stems from the complex many-to-many relationships
between video material segments and fabricated news events in real-world scenarios:
a single video clip can be utilized in multiple ways to create different fake narratives,
while a single fabricated event often combines multiple distinct video segments. However,
existing datasets do not adequately reflect such relationships due to the difficulty
of collecting and annotating large-scale real-world data, resulting in sparse coverage
and non-comprehensive learning of the characteristics of potential fake news video
creation. To address this issue, we propose a data augmentation framework AgentAug that generates diverse fake news videos by simulating typical creative processes.
AgentAug implements multiple LLM-driven pipelines of four fabrication categories for
news video creation, combined with an active learning strategy based on uncertainty
sampling to select the potentially useful augmented samples during training. Experimental
results on two benchmark datasets demonstrate that AgentAug consistently improves
the performance of short video fake news detectors.
State & Geopolitical Censorship on Twitter (X): Detection & Impact Analysis of Withheld
Content
- Yusuf Mücahit Çetinkaya
- Tuğrulcan Elmas
State and geopolitical censorship on Twitter, now X, has been turning into a routine,
raising concerns about the boundaries between criminal content and freedom of speech.
One such censorship practice, withholding content in a particular state has renewed
attention due to Elon Musk's apparent willingness to comply with state demands. In
this study, we present the first quantitative analysis of the impact of state censorship
by withholding on social media using a dataset in which two prominent patterns emerged:
Russian accounts censored in the EU for spreading state-sponsored narratives, and
Turkish accounts blocked within Turkey for promoting militant propaganda. We find
that censorship has little impact on posting frequency but significantly reduces likes
and retweets by 25%, and follower growth by 90%-especially when the censored region
aligns with the account's primary audience. Meanwhile, some Russian accounts continue
to experience growth as their audience is outside the withholding jurisdictions. We
develop a user-level binary classifier with a transformer backbone and temporal aggregation
strategies, aiming to predict whether an account is likely to be withheld. Through
an ablation study, we find that tweet content is the primary signal in predicting
censorship, while tweet metadata and profile features contribute marginally. Our best
model achieves an F1 score of 0.73 and an AUC of 0.83. This work informs debates on
platform governance, free speech, and digital repression.
T-Retrievability: A Topic-Focused Approach to Measure Fair Document Exposure in Information
Retrieval
- Xuejun Chang
- Zaiqiao Meng
- Debasis Ganguly
Retrievability of a document is a collection-based statistic that measures its expected
(reciprocal) rank of being retrieved within a specific rank cut-off. A collection
with uniformly distributed retrievability scores across documents is an indicator
of fair document exposure. While retrievability scores have been used to quantify
the fairness of exposure for a collection, in our work, we use the distribution of
retrievability scores to measure the exposure bias of retrieval models. We hypothesise
that an uneven distribution of retrievability scores across the entire collection
may not accurately reflect exposure bias but rather indicate variations in topical
relevance. As a solution, we propose a topic-focused localised retrievability measure,
which we call T-Retrievability (topic-retrievability), which first computes retrievability scores over multiple
groups of topically-related documents, and then aggregates these localised values
to obtain the collection-level statistics. Our analysis using this proposed T-Retrievability
measure uncovers new insights into the exposure characteristics of various neural
ranking models. The findings suggest that this localised measure provides a more nuanced
understanding of exposure fairness, offering a more reliable approach for assessing
document accessibility in IR systems.
Pruning Strategies for Backdoor Defense in LLMs
- Santosh Chapagain
- Shah Muhammad Hamdi
- Soukaina Filali Boubrahimi
Backdoor attacks are a significant threat to the performance and integrity of pre-trained
language models. Although such models are routinely fine-tuned for downstream NLP
tasks, recent work shows they remain vulnerable to backdoor attacks that survive vanilla
fine-tuning. These attacks are difficult to defend because end users typically lack
knowledge of the attack triggers. Such attacks consist of stealthy malicious triggers
introduced through subtle syntactic or stylistic manipulations, which can bypass traditional
detection and remain in the model, making post-hoc purification essential. In this
study, we explore whether attention-head pruning can mitigate these threats without
any knowledge of the trigger or access to a clean reference model. To this end, we
design and implement six pruning-based strategies: (i) gradient-based pruning, (ii)
layer-wise variance pruning, (iii) gradient-based pruning with structured L1/L2 sparsification,
(iv) randomized ensemble pruning, (v) reinforcement-learning-guided pruning, and (vi)
Bayesian uncertainty pruning. Each method iteratively removes the least informative
heads while monitoring validation accuracy to avoid over-pruning. Experimental evaluation
shows that gradient-based pruning performs best while defending the syntactic triggers,
whereas reinforcement learning and Bayesian pruning better withstand stylistic attacks.
More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language
Models
- Evan Chen
- Run-Jun Zhan
- Yan-Bai Lin
- Hung-Hsuan Chen
Large Language Models (LLMs) have revolutionized natural language processing, yet
concerns persist regarding their tendency to reflect or amplify social biases. This
study introduces a novel evaluation framework to uncover gender biases in LLMs: using
free-form storytelling to surface biases embedded within the models. A systematic
analysis of ten prominent LLMs shows a consistent pattern of overrepresenting female
characters across occupations, likely due to supervised fine-tuning (SFT) and reinforcement
learning from human feedback (RLHF). Paradoxically, despite this overrepresentation,
the occupational gender distributions produced by these LLMs align more closely with
human stereotypes than with real-world labor data. This highlights the challenge and
importance of implementing balanced mitigation measures to promote fairness and prevent
the establishment of potentially new biases. We release the prompts and LLM-generated
stories at GitHub.
Improving Graph Autoencoders by Hard Sample Refinement with Global Similarity
- Ge Chen
- Yulan Hu
- Sheng Ouyang
- Cuicui Luo
Masked graph autoencoders (GAEs) have attracted significant attention in recent years.
GAEs typically leverage graph neural networks to reconstruct topological properties
and node features. However, existing feature-based GAEs face performance bottlenecks,
particularly on hard-to-reconstruct nodes, due to their excessive reliance on local
aggregation. To address this limitation, we propose a novel framework, Global-Similarity-Enhanced
Graph Autoencoder (GSE-GAE). GSE-GAE adopts a knowledge distillation strategy within
a self-supervised teacher-student architecture. Specifically, a teacher module integrates
raw features and topology with long-range structural augmentations for hard nodes,
while a representation alignment loss ensures effective transfer of global knowledge
to the student model. Extensive experiments demonstrate the superiority of GSE-GAE,
providing new insights into improving performance.
Multimodal Contrastive Learning with Early Fusion for Robust Medical Signal Representation
- Lei Chen
- Kyoungsuk Park
- Junetae Kim
Contrastive learning has achieved remarkable success in the representation learning
of physiological signals foundation models. However, current approaches often focus
on unimodal settings or treat each modality independently, neglecting the rich synergistic
information that can emerge from cross-modal interactions. This leads to suboptimal
representations that fail to capture complex interdependencies across modalities.
To address this limitation, we propose a multimodal contrastive learning framework
that aligns fused representations instead of individual signals. Specifically, we
encode ECG Lead II, ECG Lead V, and PPG signals using modality-specific encoders,
followed by a fusion block to integrate modality-specific embeddings into a unified
representation. By applying contrastive learning to the unified representation, our
approach effectively mitigates inter-modal conflicts while capturing complementary
cross-modal features that would otherwise be lost in traditional alignment strategies.
Experiments on MIMIC-III (internal) and VitalDB (external) datasets demonstrate that
our approach outperforms existing baselines in patient attribute prediction tasks,
validating its effectiveness in learning comprehensive multimodal representations
of physiological signals.
G2IFS: Global-to-Instance Feature Selection in Deep Recommender System
- Lijin Chen
- Linjing You
- Jiabao Lu
- Xiayuan Huang
- Xiangli Nie
Feature selection plays a vital role in recommender systems by identifying informative
features for accurate prediction. While adaptive methods like AdaFS select instance-wise
features based on sample variability, they often overlook globally important features
and suffer from limited transferability. To address these limitations, we propose
G2IFS (Global-to-Instance Feature Selection), a novel framework that integrates global
distributional patterns with instance-level adaptation for more robust and generalizable
feature selection. G2IFS consists of an online statistics module, a main scoring network,
and a non-parametric Gaussian mixture module. The online statistics module maintains
global estimates of class-wise statistics to compute Fisher scores, which guide the
scoring network in learning instance-specific feature importance. The Gaussian module
further mitigates co-adaptation and improves transferability. Extensive experiments
across diverse recommendation models and real-world datasets show that G2IFS consistently
outperforms state-of-the-art baselines in terms of accuracy, efficiency, and transferability.
In-depth analysis further reveals that various global importance signals-when integrated
into traditional methods like AdaFS-consistently lead to significant performance improvements,
underscoring the general effectiveness of combining global and instance-level signals
in recommender system feature selection. The code is available at https://github.com/youlj109/G2IFS.
VQA-Induct: Instruction Induction for Visual Question Answering
- Po-Chun Chen
- Hen-Hsen Huang
- Hsin-Hsi Chen
Multimodal Large Language Models (MLLMs) have shown strong capabilities in Visual
Question Answering (VQA) tasks. However, current approaches for enhancing VQA reasoning
performance often assume access to extensive resources such as large annotated datasets,
external tools, or numerous demonstrations, which are impractical for real-world users
who typically possess only a few demonstrations. We present VQA-Induct, a framework
for data-scarce scenarios that leverages MLLMs' instruction induction capabilities
to induce reusable, purely textual task-level instructions from as few as three demonstrations
of the same task, then applies these instructions to new instances using only their
image-question pairs. Comprehensive experiments on PuzzleVQA and AlgoPuzzleVQA across
diverse MLLMs demonstrate that our method outperforms state-of-the-art methods without
requiring demonstrations at inference time. Furthermore, instructions induced by stronger
models effectively boost the performance of smaller models, enabling cost-efficient
reasoning at inference time.
Arrows of Math Reasoning Data Synthesis for Large Language Models: Diversity, Complexity
and Correctness
- Sirui Chen
- Changxin Tian
- Binbin Hu
- Kunlong Chen
- Ziqi Liu
- Zhiqiang Zhang
- Jun Zhou
Enhancing the mathematical reasoning of large language models (LLMs) demands high-quality
training data, yet conventional methods face critical challenges in scalability, cost,
and data reliability. To address these limitations, we propose a novel program-assisted
synthesis framework that systematically generates a high-quality mathematical corpus
with guaranteed diversity, complexity, and correctness. This framework integrates
mathematical knowledge systems and domain-specific tools to create executable programs.
These programs are then translated into natural language problem-solution pairs and
vetted by a bilateral validation mechanism that verifies solution correctness against
program outputs and ensures program-problem consistency. We have generated 12.3 million
such problem-solving triples. Experiments demonstrate that models fine-tuned on our
data significantly improve their inference capabilities, achieving state-of-the-art
performance on several benchmark datasets and showcasing the effectiveness of our
synthesis approach.
Toward Secure Federated Partial Label Learning Against Poisoning Attacks
- Xubin Chen
- Zhengjie Yang
- Xinyi Sheng
- Sen Fu
- Wei Bao
How to defend against attacks in Federated Partial Label Learning (FedPLL) is a brand
new and challenging question in machine learning security due to stealthy and efficient
attack behaviors of adversaries. In this paper, we systematically study this problem
by developing an Adaptive Partial Label Attack (APLA) which subtly manipulates the
candidate label set of the data sample. To defend against APLA, we develop the RobustFedPLL
framework incorporating three modules: (1) in preliminary clustering, we implement
a Gaussian Mixture Model (GMM) and a moving average mechanism to identify clients'
confidence; (2) in representation contrasting, we develop a contrast-based algorithm
to obtain clients' model feature representations; (3) in final clustering, we utilize
mainstream clustering algorithms to finally distinguish adversaries. Experiments of
RobustFedPLL and SOTA defense algorithms based on two datasets are conducted, demonstrating
the superiority of RobustFedPLL under various experimental settings.
Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced
Forecasting
- Zhuomin Chen
- Dan Li
- Jiahui Zhou
- Shunyu Wu
- Haozheng Ye
- Jian Lou
- See-Kiong Ng
Time series (TS) data are ubiquitous across various application areas, rendering time
series forecasting (TSF) a fundamental task. With the astounding advances in large
language models (LLMs), a variety of methods have been developed to adapt LLMs for
time series forecasting. Despite unlocking the potential of LLMs in comprehending
TS data, existing methods are inherently constrained by their shallow integration
of TS information, wherein LLMs typically access TS representations at shallow layers,
primarily at the input layer. This causes the influence of TS representations to progressively
fade in deeper layers and eventually leads to ineffective adaptation between textual
embeddings and TS representations. In this paper, we propose the Multi-layer Steerable
Embedding Fusion (MSEF), a novel framework that enables LLMs to directly access time
series patterns at all depths, thereby mitigating the progressive loss of TS information
in deeper layers. Specifically, MSEF leverages off-the-shelf time series foundation
models to extract semantically rich embeddings, which are fused with intermediate
text representations across LLM layers via layer-specific steering vectors. These
steering vectors are designed to continuously optimize the alignment between time
series and textual modalities and facilitate a layer-specific adaptation mechanism
that ensures efficient few-shot learning capabilities. Experimental results on seven
benchmarks demonstrate significant performance improvements by MSEF compared with
baselines, with an average reduction of 31.8% in terms of MSE. The code is available
at https://github.com/One1sAll/MSEF.
FUTURE: Flexible Unlearning for Tree Ensemble
- Ziheng Chen
- Jin Huang
- Jiali Cheng
- Yuchan Guo
- Mengjie Wang
- Lalitesh Morishetti
- Kaushiki Nag
- Hadi Amiri
Tree ensembles are widely recognized for their effectiveness in classification tasks,
achieving state-of-the-art performance across diverse domains, including bioinformatics,
finance, and medical diagnosis. With increasing emphasis on data privacy and the right to be forgotten, several unlearning algorithms have been proposed to enable tree ensembles to forget
sensitive information. However, existing methods are often tailored to a particular
model or rely on the discrete tree structure, making them difficult to generalize
to complex ensembles and inefficient for large-scale datasets. To address these limitations,
we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically,
we formulate the problem of forgetting samples as a gradient-based optimization task.
In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic
model approximations within the optimization framework. This enables end-to-end unlearning
in an effective and efficient manner. Extensive experiments on real-world datasets
show that FUTURE yields significant and successful unlearning performance.
Contrastive ECOC: Learning Output Codes for Adversarial Defense
- Che-Yu Chou
- Hung-Hsuan Chen
Although one-hot encoding is commonly used for multiclass classification, it is not
always the most effective encoding mechanism. Error Correcting Output Codes (ECOC)
address multiclass classification by mapping each class to a unique codeword used
as a label. Traditional ECOC methods rely on manually designed or randomly generated
codebooks, which are labor-intensive and may yield suboptimal, dataset-agnostic results.
This paper introduces three models for automated codebook learning based on contrastive
learning, allowing codebooks to be learned directly and adaptively from data. Across
four datasets, our proposed models demonstrate superior robustness to adversarial
attacks compared to two baselines. The source is available at GitHub.
MU-OT: Effective and Unified Machine Unlearning with Optimal Transport for Feature
Realignment
- Sangjun Chung
- Simon S. Woo
Machine unlearning has emerged as a significant research topic in response to the
increasing demands for data privacy and compliance with privacy regulations. The main
challenge is to eliminate the influence of a specific subset of training data from
a pretrained model while preserving the model's performance on the retain set without
retraining the model from scratch. In this paper, we propose a novel efficient unlearning
framework based on Optimal Transport, which can effectively work on both class-wise
and instance-wise unlearning tasks. By analyzing and comparing the feature spaces
of the original and retrained models, we formulate the unlearning problem as a distribution
alignment task between the forget set and the retain set. We guide the feature distribution
of the forget set, which initially forms distinct and structured patterns, to align
with that of the retain set. Extensive experiments on three public benchmark datasets
demonstrate its superior effectiveness compared to previous state-of-the-art methods.
H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
- Huangyu Dai
- Lingtao Mao
- Ben Chen
- Zihan Wang
- Zihan Liang
- Ying Han
- Chenyi Lei
- Han Li
Hotword customization is crucial in ASR to enhance the accuracy of domain-specific
terms. It has been primarily driven by the advancements in traditional models and
Audio large language models (LLMs). However, existing models often struggle with large-scale
hotwords, as the recognition rate drops dramatically with the number of hotwords increasing.
In this paper, we introduce a novel hotword customization system that utilizes a hotword
pre-retrieval module (H-PRM) to identify the most relevant hotword candidate by measuring
the acoustic similarity between the hotwords and the speech segment. This plug-and-play
solution can be easily integrated into traditional models such as SeACo-Paraformer,
significantly enhancing hotwords post-recall rate (PRR). Additionally, we incorporate
H-PRM into Audio LLMs through a prompt-based approach, enabling seamless customization
of hotwords. Extensive testing validates that H-PRM can outperform existing methods,
showing a new direction for hotword customization in ASR.
DP-COMET: A Differential Privacy Contextual Obfuscation MEchanism for Texts in Natural
Language Processing
- Francesco Luigi De Faveri
- Guglielmo Faggioli
- Nicola Ferro
Protecting sensitive information within textual data strongly depends on the context
in which the data is presented. However, current privacy-preserving obfuscation mechanisms
based on epsilon-Differential Privacy (DP) produce an obfuscated private text, changing
the original phrase term-by-term without considering the context in which such a term
is placed. This paper introduces DP-COMET, an epsilon-DP obfuscation mechanism that
evaluates a text's context before producing its private version. The mechanism defines
a representation of the original text that considers the entire context within the
text, producing an obfuscated version after adding noise to this representation and
depending on the privacy parameter epsilon. We test DP-COMET on different Natural
Language Processing (NLP) and Information Retrieval (IR) downstream tasks, and our
findings show that our obfuscation mechanism not only achieves comparable performance
results to traditional term-by-term mechanisms but also produces obfuscated texts
less similar to the originals. To promote the reproducibility of DP-COMET, we make
the code publicly available at https://github.com/Kekkodf/DP-COMET.
+VeriRel: Verification Feedback to Enhance Document Retrieval for Scientific Fact
Checking
- Xingyu Deng
- Xi Wang
- Mark Stevenson
Identification of appropriate supporting evidence is critical to the success of scientific
fact checking. However, existing approaches rely on off-the-shelf Information Retrieval
algorithms that rank documents based on relevance rather than the evidence they provide
to support or refute the claim being checked. This paper proposes +VeriRel which includes
verification success in the document ranking. Experimental results on three scientific
fact checking datasets (SciFact, SciFact-Open and Check-Covid) demonstrate consistently
leading performance by +VeriRel for document evidence retrieval and a positive impact
on downstream verification. This study highlights the potential of integrating verification
feedback to document relevance assessment for effective scientific fact checking systems.
It shows promising future work to evaluate fine-grained relevance when examining complex
documents for advanced scientific fact checking.
POLAR: Policy Optimization for Literature Analysis under Review Constraints
Systematic reviews are vital for evidence-based decision-making but remain resource-intensive
due to the volume of literature requiring expert screening. Technology-Assisted Review
(TAR) systems offer a solution by ranking documents for review, yet questions remain
about how best to allocate limited human effort across multiple review topics. In
this paper, we explore the problem of effort distribution by comparing alternative
screening policies under fixed effort constraints. Using real-world data from the
CLEF eHealth 2017-2019 TAR tasks, we evaluate both baseline and adaptive policies
that account for topic size, screening depth, and residual uncertainty. We introduce
effort-aware evaluation metrics to measure trade-offs between review effectiveness
and resource use. Our results show that simple, topic-sensitive policies can significantly
improve the yield of relevant documents discovered, offering practical insights for
scalable and equitable systematic review workflows.
AI on the Pulse: Real-Time Health Anomaly Detection with Wearable and Ambient Intelligence
- Davide Gabrielli
- Bardh Prenkaj
- Paola Velardi
- Stefano Faralli
We introduce AI on the Pulse, a real-world-ready anomaly detection system that continuously
monitors patients using a fusion of wearable sensors, ambient intelligence, and advanced
AI models. Powered by UniTS, a state-of-the-art (SoTA) universal time-series model,
our framework autonomously learns each patient's unique physiological and behavioral
patterns, detecting subtle deviations that signal potential health risks. Unlike classification
methods that require impractical, continuous labeling in real-world scenarios, our
approach uses anomaly detection to provide real-time, personalized alerts for reactive
home-care interventions. Our approach outperforms 12 SoTA anomaly detection methods,
demonstrating robustness across both high-fidelity medical devices (ECG) and consumer
wearables, with a ~22% improvement in F1 score. However, the true impact of AI on
the Pulse lies in @HOME, where it has been successfully deployed for continuous, real-world
patient monitoring. By operating with non-invasive, lightweight devices like smartwatches,
our system proves that high-quality health monitoring is possible without clinical-grade
equipment. Beyond detection, we enhance interpretability by integrating LLMs, translating
anomaly scores into clinically meaningful insights for healthcare professionals.
Think it Image by Image: Multi-Image Moral Reasoning of Large Vision-Language Models
- Chujie Gao
- Yue Huang
- Xiangqi Wang
- Siyuan Wu
- Nitesh V. Chawla
- Xiangliang Zhang
Vision Language Models (VLMs) have demonstrated remarkable success in downstream applications,
yet they often exhibit biases, raising ethical concerns. While previous efforts have
aimed to evaluate and improve the moral reasoning capabilities of VLMs, existing approaches
are limited by simplified, unimodal settings or overly static visual scenarios. We
propose a novel multi-image-based dataset pipeline MIST (Moral Inference through Storytelling with Text and Images) designed to assess moral reasoning in complex, dynamic scenarios to
address these limitations. To ensure better alignment between these modalities, we
introduce the concept of ''text-image flow,'' which seamlessly integrates visual and
textual information across complex scenarios. Using this dataset, we evaluate seven
widely used VLMs, offering critical insights into their performance in moral reasoning
tasks.
LLM-OFA: On-the-Fly Adaptation of Large Language Models to Address Temporal Drift
Across Two Decades of News
- Pouya Ghahramanian
- Sepehr Bakhshi
- Fazli Can
We investigate the problem of on-the-fly adaptation (OFA) with online feedback for large language models (LLMs) in the context of temporally
evolving data. In this setting, each incoming instance-or a small batch- is first
processed for inference, and its true label is revealed immediately after prediction,
allowing the model to be updated in a sequential, single-pass manner. While pre-trained
LLMs achieve state-of-the-art results across NLP tasks, they often struggle to generalize
under dynamic distribution shifts-particularly in continuously evolving environments.
Despite the importance of this problem, existing research on online adaptation of
LLMs remains limited, and there is a lack of large-scale benchmarks for evaluating
such methods. To address these gaps, we introduce 1M-News, a large-scale benchmark of one million New York Times headlines spanning two decades,
and benchmark six state-of-the-art LLMs by fine-tuning them on the first 10 years
and applying OFA on the following 10 years. To improve adaptation performance, we
develop Adaptimizer, the first optimizer specifically designed for OFA, enabling rapid and stable model
updates under temporal distribution shift. Adaptimizer maintains two sets of weights-fast
and slow-balancing rapid adaptation with long-term stability and generalization across
the stream. Our experiments demonstrate that OFA with Adaptimizer achieves consistent
improvements over static baselines. All code and data are publicly available at https://github.com/pouyaghahramanian/LLM-OFA.
Approximating Gradient-Based Influence for Scalable Instruction Data Selection
- Mohammad Gharehhasanloo
- Yueting Chen
- Nick Koudas
- Xiaohui Yu
Instruction Tuning (IT) is crucial for enhancing Large Language Models (LLMs), but
training on all available instructions is often unnecessary and computationally costly.
Recent studies show that small, well-chosen subsets can match or exceed full dataset
performance, motivating efficient data selection techniques. While gradient-based
methods like LESS estimate sample influence effectively, they are expensive due to
per-sample gradient computation. We propose Approx-LESS, a scalable alternative that
computes LoRA-based gradient features for a small fraction of the samples and trains
regression models to predict influence scores for the rest. This enables selection
and tuning on the most impactful samples. On three validation sets with a fixed 270K
instruction corpus, Approx-LESS outperforms applicable baselines and closely matches
LESS, reducing gradient extraction time by over 3x. It also shows high sample selection
overlap with LESS, making it an effective, low-cost method for influence-based instruction
tuning.
Time-Period-Aware Embedding Regeneration for Session-Based Recommendation
- Cheng Guo
- Rui Xue
- Jeff Zhang
Session-based recommender systems typically focus on intra-session user behavior but
often overlook the macro-level temporal evolution of items themselves. To address
this gap, we introduce a model that explicitly captures item dynamics by regenerating
time-period-aware embeddings. Our approach partitions the data timeline into several
distinct periods and employs a simple yet effective module, TEG, which uses a GRU
and a causal attention layer to recurrently learn how item representations evolve
from one period to the next. This allows our model to capture global, long-term trends
while its gating mechanism naturally accommodates both dynamic and static items. Extensive
experiments on three real-world datasets show that our method achieves highly competitive
performance against complex state-of-the-art models. More importantly, it demonstrates
the significant and complementary value of modeling global item evolution, providing
a new dimension for improving session-based recommendation.
Pseudo-Inverse Prefix Tuning for Effective Unlearning in LLMs
- Preethi Gurumurthy
- P.K. Srijith
Large Language Models (LLMs) are widely used in many real-world applications, but
their deployment raises concerns about data privacy and compliance with regulations
such as the right to be forgotten. To address these challenges, we explore the problem
of machine unlearning, selectively removing the influence of specific training data
from a model. While many existing approaches require retraining the entire model and
access to both forget data and retain data, we propose Pseudo-Inverse Prefix Tuning (PI-Prefix), a parameter-efficient fine-tuning method that enables targeted forgetting with minimal
overhead. PI-Prefix learns a small set of prefix parameters on the data to be forgotten
and then applies pseudo-inverse transformation to unlearn the forget data while maintaining
performance on retain data. Our experiments on two sentiment classification tasks
(SST-2 and Yelp) demonstrate that PI-Prefix achieves effective and interpretable forgetting,
with forget-set performance approaching random prediction. It preserves a strong generalization
on the retain set even without requiring it during unlearning. These results highlight
PI-Prefix as a promising direction for scalable and compliant unlearning in data removal
contexts.
Study on LLMs for Promptagator-Style Dense Retriever Training
- Daniel Gwon
- Nour Jedidi
- Jimmy Lin
Promptagator demonstrated that Large Language Models (LLMs) with few-shot prompts
can be used as task-specific query generators for fine-tuning domain-specialized dense
retrieval models. However, the original Promptagator approach relied on proprietary
and large-scale LLMs which users may not have access to or may be prohibited from
using with sensitive data. In this work, we study the impact of open-source LLMs at
accessible scales (≤14B parameters) as an alternative. Our results demonstrate that
open-source LLMs as small as 3B parameters can serve as effective Promptagator-style
query generators. We hope our work will inform practitioners with reliable alternatives
for synthetic data generation and give insights to maximize fine-tuning results for
domain-specific applications. Our code is available at https://www.github.com/mitll/promptodile
DocPolicyKG: A Lightweight LLM-Based Framework for Knowledge Graph Construction from
Chinese Policy Documents
- Chen Han
- Yuanyuan Li
- Xijin Tang
Chinese policy documents are typically written in a concise yet contextually rich
style, containing implicit hierarchical logic and strategic intent. These linguistic
and structural features present challenges for traditional information extraction
methods, which often struggle with cross-sentence dependencies and semantic complexity.
To address these, we propose DocPolicyKG, a novel framework for constructing knowledge
graphs from Chinese policy documents using lightweight large language models (LLMs),
integrating domain ontology, fine-tuning, and prompt engineering. Focusing on the
investment promotion policies in China, experimental results demonstrate that DocPolicyKG
significantly outperforms the base model Deepseek-R1-7B and achieves performance competitive
with GPT-4o on both named entity recognition (NER) and relation triplet extraction
(RTE) tasks. Building on DocPolicyKG, we construct the first large-scale knowledge
graph of Chinese investment promotion policies, and further integrate Graph Retrieval-Augmented
Generation (Graph RAG) to support policy question answering through entity-relation
reasoning and semantic retrieval. The graph data is publicly available at: github.com/hanshenmesen/DocPolicyKG.
RAG-based Unanswerable Question Detection in Clinical Text-to-SQL
- Donghee Han
- Seungjae Lim
- Mun Yong Yi
Large-scale language models (LLMs) have shown exceptional performance in various tasks,
particularly in zero-shot and few-shot settings. However, in sensitive domains like
healthcare, detecting unanswerable questions remains a critical challenge. This task
is challenging due to data imbalance, and existing methods are computationally expensive
and inflexible to data distribution changes. To address these issues, we propose Retrieval-augmented
Question Answerability Detection (RaQAD), a training-free method that uses LLMs to identify unanswerable questions by retrieving
semantically similar examples as few-shot prompts. RaQAD ensures semantically similar
sampling, adapts to schema changes, and eliminates the need for additional training.
Extensive experiments on clinical datasets demonstrate its effectiveness in outperforming
existing approaches while addressing data imbalance challenges.
From Bicliques to BiFlexi Cliques: A New Era of Bipartite Subgraph Discovery
- Taejoon Han
- Song Kim
- Woungjae Choo
- Junghoon Kim
Real-world bipartite communities tend to exhibit relaxed internal connectivity as
their size increases, making traditional biclique models too restrictive for cohesive
subgraph discovery. In this paper, we propose the Biflexi, a novel bipartite subgraph
model that employs flexible, size-adaptive degree thresholds based on sublinear constraints.
Our approach dynamically adjusts connectivity requirements according to subgraph size,
enabling the discovery of larger and more realistic cohesive structures. We prove
that the Maximum Biflexi problem is NP-hard and develop an efficient heuristic algorithm.
Experimental results on real-world datasets demonstrate the effectiveness and scalability
of our algorithm and the applicability of our model.
Mixture-of-KAN for Multivariate Time Series Forecasting
- Xiao Han
- Zhenduo Zhang
- Xinfeng Zhang
- Yiling Wu
- Zhe Wu
Multivariate time series forecasting is a crucial task that predicts the future states
based on historical inputs. Although current deep learning-based methods have made
significant advancements, they still face the criticism of lacking interpretability.
The rise of the Kolmogorov-Arnold Network (KAN) provides a new perspective to implement
an efficient and interpretable deep learning-based method for forecasting time series.
However, we find there are two main challenges in the application of KAN in time series
forecasting: how to select the appropriate one from various KAN variants and how to
train the deep KAN-based network. To this end, we propose the multi-layer mixture-of-KAN
network, which achieves excellent performance while retaining KAN's ability to be
transformed into a combination of symbolic functions. The core module is the mixture-of-KAN
layer, which uses a mixture-of-experts structure to assign variables to best-matched
KAN experts. Then, we analyze the shortcomings of parameter initialization in the
original KAN and provide an effective initialization method to alleviate training
instability. Extensive experimental results demonstrate that our proposed method is
effective in multivariate time series forecasting. Codes are released in https://github.com/2448845600/EasyTSF.
Information Diffusion Prediction Based on User Multi-Dimensional Feature Interaction
- Jiaxing He
- Yang Fang
- Tianyang Shao
- Xiang Zhao
Information diffusion prediction, the forecasting of propagation paths, provides critical
insights into information spread mechanisms, directly enabling applications like misinformation
spread forecasting and detection for malicious account. Prior research primarily focused
on combining user social graphs and information cascades for prediction, often overlooking
the distinct role characteristics users exhibit during interactions. Classifying users
into different roles enables the construction of a multi-layered social graph, facilitating
the extraction of deeper user features. This paper introduces a model that leverages
multi-dimensional interactions between user features. Specifically, to account for
users' dynamic preferences, we construct sequential hypergraphs from information cascades
using timestamps and utilize a hypergraph neural network to extract users' dynamic
features. Furthermore, to capture users' static features, we build multi-layer social
networks from the social graph based on users' roles. We employ graph convolutional
networks to separately extract static features from each layer and subsequently fuse
them using an attention mechanism. Superior performance of our framework is evidenced
by experimental validation on real-world datasets against cutting-edge benchmarks.
Sparse Autoencoders in Collaborative Filtering Enhanced LLM-based Recommender Systems
- Xinyu He
- Jose Sepulveda
- Fei Wang
- Hanghang Tong
Large language models (LLM) have demonstrated remarkable capability in recommendation
tasks. Recently, efforts have been made to further enhance LLM performance with collaborative
knowledge learned from traditional recommender systems. One approach is to inject
learned embeddings into LLM prompts through a trainable projector, yet these embeddings
could carry noisy or irrelevant information. In this paper, we propose using sparse
autoencoders to improve input prompts. We show that sparse autoencoders can learn
highly interpretable embeddings and extract key collaborative features in the case
of recommender systems. With the help of sparse autoencoders, we are able to extract
collaborative features to augment input prompts. By capturing TopK features of each
item, we mitigate noisy information from item embeddings, therefore sparse autoencoders
can also help with denoising embeddings in prompts. We develop two methods that utilize
sparse autoencoders to augment or denoise input prompts. We evaluate the proposed
methods on three real-world datasets and both show promising performance improvements.
Assessing Natural Language Explanations of Relational Graph Neural Networks
- Stefan Heindorf
- Daniel Neib
Although relational graph neural networks (RGNNs) excel at learning from graph-structured
data as it appears in knowledge graphs, they often lack interpretability. While natural
language (NL) explanations offer a promising solution, evaluating these explanations
remains largely unaddressed. There is no unified evaluation framework including evaluation
metrics, benchmarking datasets, and established evaluation procedures. This paper
introduces NLEF, a novel NL Evaluation Framework to assess the quality of NL explanations
for RGNNs. It uses the NL explanations to make new predictions and assesses in how
far they align with the predictions produced by the RGNN. Towards this end, we propose
two methods: (1) we convert NL explanations to description logics (DL) and use a DL
reasoner for node classifications, (2) we use a retrieval augmented approach (RAG)
for node classifications. Our evaluation results show that our DL method is highly
scalable, whereas the RAG approach often yields the highest performance.
Structuring Data Science Automation: A Competency-Aware Taxonomy Approach
- Maike Madeline Holtkemper
- Max Pernklau
- Christian Beecks
The growing number of data science (DS) automation frameworks complicates the selection
of suitable tools for project-specific tasks. Current overviews emphasize functional
capabilities or pipeline coverage, yet often overlook how tools align with DS workflows
and user competencies. This paper introduces a first approach toward a competency-focused
taxonomy of DS automation tools. Based on a structured literature review, we identify
which CRISP-DM tasks are automated, how automation is achieved, and which DS competencies
are required for effective use. We classify tools across four dimensions: CRISP-DM
stage, task function, degree of automation, and required competencies. This taxonomy
enables practitioners to match tools with workflow needs and team skill sets, while
also clarifying human-in-the-loop dependencies. We validate the taxonomy through expert
interviews and case mappings, demonstrating its practical value in identifying competency
gaps and guiding framework adoption.
Adaptive Spike Neural Networks for Natural Language Inference Tasks with Dynamic Spike
Predictor
- Seung-Kyu Hong
- Hyuk-Yoon Kwon
Spike Neural Networks offer energy efficiency and are promising candidates for ultra-low-power
inference on neuromorphic hardware. While extensively studied in computer vision,
their application in Natural Language Processing remains limited and underexplored.
Three significant challenges of the existing work are as follows: (1) spike firing
functions are sensitive to initial conditions, (2) spike timings are stochastic even
for identical token inputs, preventing the stable preservation of contextual information,
and (3) the analysis of spike occurrences on learning effectiveness is limited. To
improve learning efficiency and stability, we propose Dynamic Spike Predictor (DSP)
that adaptively regulates spike generation. DSP predicts a scale-adjusted input current
at each time step to regulate spike activity, maintaining stable gradient flow, with
only about 0.2% additional parameters to the backbone SNNs. We validate its effectiveness
through comprehensive experiments on three NLI benchmarks (CB, RTE, and SICK), addressing
research questions on the learning performance, robustness, and extensibility of DSP.
The code is available at https://github.com/bigbases/Spike-Predictor.
Context-Aware Fine-Grained Graph RAG for Query-Focused Summarization
- Yubin Hong
- ChaoFan Li
- Jingyi Zhang
- Yingxia Shao
Retrieval-Augmented Generation (RAG) enables large language models to provide more
precise and pertinent responses by incorporating external knowledge. In the Query-Focused
Summarization (QFS) task, GraphRAG-based approaches have notably enhanced the comprehensiveness
and diversity of generated responses. However, existing GraphRAG-based approaches
lack sufficient fine-grained contextual information during graph retrieval, resulting
in LLMs being unable to accurately understand the detailed and specific background
knowledge of a query. To address it, we propose Context-Aware Fine-Grained Graph RAG
(FG-RAG). On the one hand, FG-RAG employs Context-Aware Entity Expansion in graph
retrieval to provide more contextual information for the retrieved content. On the
other hand, FG-RAG utilizes Query-Level Fine-Grained Summarization to incorporate
fine-grained details during response generation, enhancing query awareness for the
generated summarization. Our evaluation demonstrates that FG-RAG outperforms other
RAG systems in multiple metrics of comprehensiveness, diversity, and empowerment when
handling the QFS task. Our implementation is available at https://github.com/BuptWululu/FG-RAG.
Spatio-Temporal Residual Masked Autoencoder for Urban Rent Estimation
- Chenya Huang
- Bin Liang
- Zhidong Li
- Justin Wang
- Fang Chen
Housing affordability has become a critical issue in many cities, but gaps in rental
transaction records hinder accurate rent estimation. Conventional spatial interpolation
and time-series regression methods either ignore temporal trends or over-smooth spatial
variation, leading to biased rent estimates in under-reported areas. While recent
masked autoencoding techniques for tabular imputation address feature-wise missingness,
they do not explicitly model the joint spatio-temporal structure of urban rent dynamics.
This paper proposes a Spatio-Temporal Residual Masked Autoencoder (ST-ResMAE) that
reconstructs masked rental values by integrating continuous covariates with learnable
spatio-temporal embeddings and by modelling residuals over both space and time. Case
study on the selected Australian urban suburbs rental data from 2020 to 2024 show
that ST-ResMAE reduces imputation error by 5% relative to recent masked-autoencoding
methods and by 15% relative to traditional regression models. These results demonstrate
ST-ResMAE's ability to capture complex spatio-temporal rent dynamics even when data
are sparse.
XDNet: Disentangled Time Series Forecasting via Exponential Decomposition and 2D Periodic
Modeling
- Kening Huang
- Qianqian Ren
- Xingfeng Lv
In time series analysis, disentangling long-term trends and seasonal patterns is crucial
for capturing multi-scale temporal structures and improving both interpretability
and forecasting accuracy. Recently, 2D modeling techniques have been incorporated
into multivariate forecasting frameworks to better exploit periodic patterns. However,
conventional decomposition methods often rely on simplistic moving averages that obscure
critical patterns, while 2D modeling may entangle global trends with local variations
and fail to normalize seasonal amplitudes-ultimately impairing both interpretability
and forecast accuracy. To overcome these limitations, we propose XDNet (Exponential-Dimensional
Network), a principled forecasting framework that explicitly disentangles trend and
seasonal dynamics. At its core lies the Exponentially Weighted Decomposition (XWD),
which applies decaying weights to past observations to preserve the integrity of long-term
trends while adaptively normalizing seasonal fluctuations. The trend component is
modeled using Temporal Kolmogorov-Arnold Networks (KAN) to capture intricate nonlinear
dynamics, while the seasonal component is processed through a refined Inception-based
module that robustly extracts fine-grained periodic dependencies. Extensive experiments
on multiple benchmark datasets demonstrate that XDNet achieves state-of-the-art forecasting
performance, delivering up to a 2.79% improvement in average accuracy over leading
baselines, particularly in long-horizon prediction tasks.
Jailbreaking LLMs Through Alignment Vulnerabilities in Out-of-Distribution Settings
- Yue Huang
- Jingyu Tang
- Dongping Chen
- Bingda Tang
- Yao Wan
- Lichao Sun
- Philip Yu
- Xiangliang Zhang
Recently, Large Language Models (LLMs) have shown remarkable capabilities, but concerns
about their trustworthiness-especially under ''jailbreaking'' attacks-remain unresolved.
Prior work often assumes white-box access or relies on fixed prompt templates, limiting
practicality. We propose ObscurePrompt, a simple yet effective black-box jailbreak
method inspired by fragile LLM alignment on Out-of-Distribution (OOD) inputs. ObscurePrompt
constructs base prompts using existing jailbreak techniques, then employs powerful
LLMs to iteratively generate obscure variants that evade detection. Extensive experiments
demonstrate that ObscurePrompt outperforms existing methods and remains effective
against two widely-used defenses.
FASE: Feature-Aligned Scene Encoding for Open-Vocabulary Object Detection in Remote
Sensing
- Hyeonsu Hwang
- Simon S. Woo
Open-vocabulary object detection (OVD) in remote sensing (RS) has shown remarkable
generalization capabilities across diverse RS imagery through alignment between image
and text embeddings. Such methods have further improved detection performance by incorporating
additional scene-level context from both visual and textual domains. However, existing
methods approximate scene context by simply averaging the text embeddings of the image's
object labels, which is insufficient to capture the rich linguistic context present
in RS scenes. To address this limitation, we propose a novel Feature-Aligned Scene
Encoding (FASE), which constructs comprehensive scene representations through high-quality
captions generated by a specialized vision-language model. Our Feature Alignment Module
(FAM) creates a robust scene representation by fusing domain-specific caption embeddings
with general text features through dual-branch fusion with gating and cross-attention.
This resulting representation then facilitates the alignment with visual features.
By utilizing enhanced scene encoding only during training, our method internalizes
rich contextual knowledge without increasing inference complexity. Experiments on
multiple benchmarks demonstrate significant improvements over state-of-the-art methods,
validating the effectiveness of our approach for OVD in RS.
Multimodal RAG Enhanced Visual Description
- Amit Kumar Jaiswal
- Haiming Liu
- Ingo Frommholz
Textual descriptions for multimodal inputs entail recurrent refinement of queries
to produce relevant output images. Despite efforts to address challenges such as scaling
model size and data volume, the cost associated with pre-training and fine-tuning
remains substantial. However, pre-trained large multimodal models (LMMs) encounter
a modality gap, characterised by a misalignment between textual and visual representations
within a common embedding space. Although fine-tuning can potentially mitigate this
gap, it is typically expensive and impractical due to the requirement for extensive
domain-driven data. To overcome this challenge, we propose a lightweight training-free
approach utilising Retrieval-Augmented Generation (RAG) to extend across the modality
using a linear mapping, which can be computed efficiently. Our reproducible code can
be found in https://github.com/amitkumarj441/mRAG-gim. During inference, this mapping
is applied to images embedded by an LMM enabling retrieval of closest textual descriptions
from the training set. These textual descriptions, in conjunction with an instruction,
cater as an input prompt for the language model to generate new textual descriptions.
In addition, we introduce an iterative technique for distilling the mapping by generating
synthetic descriptions via the language model facilitating optimisation for standard
utilised image description measures. Experimental results on two benchmark multimodal
datasets demonstrate significant improvements.
An Efficient PIM-Based Graph Engine on a Single Machine
- Myung-Hwan Jang
- Min-Kyeong Shin
- Taehyeong Park
- Yongjun Park
- Sang-Wook Kim
With the increasing size of real-world networks, efficient analysis of large-scale
graphs has become an important research area. To this end, we can consider Processing-in-Memory
(PIM), which integrates processing units and main memory into a single chip, as a
promising solution. Many studies have focused on enabling highly efficient processing
of memory-intensive tasks by using PIM's high internal bandwidth. To the best of our
knowledge, however, there have been no studies related to the scenarios where the entire graph does not fit in main memory and data movement across storage, memory, and cache should be considered. Motivated
by this, we propose RealGraph PIM , a new PIM-based graph engine, that processes large-scale real-world graphs efficiently
on top of the original RealGraph, a state-of-the-art CPU-based graph engine. RealGraph
PIM employs (1) asynchronous I/O to reduce wasting time in an idle state and (2) column-wise
partitioning to reduce CPU workloads, thereby issuing I/O requests more frequently.
Experimental results on real-world datasets show that RealGraph PIM outperforms dramatically state-of-the-art graph engines including a naive version
of RealGraphPIM.
Leveraging Large Language Models for Complementary Product Ads Recommendation
- Byung Eun Jeon
- Ryan Bae
- Xiao Bai
Recommending complementary products1 that fulfill a joint need (e.g., phone case for
smartphone) are often overlooked by dynamic product advertising (DPA) systems despite
their success on e-Commerce websites such as Amazon. Existing works on complementary
product recommendation focus on mining frequently co-purchased products but suffer
from low accuracy as co-purchased products are not always complements to each other.
More recent works rely on human annotators to clean co-purchased product pairs and
use to train end-to-end models for complementary product recommendation. However,
unlike e-commerce websites, DPA systems usually do not have access to users' complete
shopping history, making the identification of co-purchased products challenging.
Moreover, depending on the product types, identifying the complements of a given product
may require extensive domain knowledge that is not present in a pair of complementary
products. In this work, we propose a novel generate-and-retrieval paradigm to make
complementary product recommendations and explore the use of LLMs for this task. Specifically,
we rely on LLMs to generate queries that describe the complements of an original product.
The generated queries are then used to retrieve relevant products from a product index.
The retrieved products are expected to be complementary to the original product. We
design experiments using the public Amazon ESCI datasets and compare in-context learning
with parameter efficient fine-tuning using models from the GPT and Gemini families
for complementary product generation. Our evaluation shows that by leveraging the
extensive knowledge of LLMs on product relationship, using only a small number of
human-annotated examples, pre-trained LLMs with proper prompt outperform LLMs fine-tuned
with tens of thousands human-annotated examples.
Watch Your Step: A Fine-Grained Evaluation Framework for Multi-hop Knowledge Editing
in Large Language Models
- Geunyeong Jeong
- Juoh Sun
- Harksoo Kim
Knowledge editing allows for targeted updates of specific factual information in Large
Language Models (LLMs). While existing methods can effectively update localized facts,
they often struggle to coherently integrate these updates into the model's broader
knowledge structure. Multi-hop knowledge editing addresses this issue by aiming that
edited information is consistently reflected throughout the multi-hop reasoning process.
However, current evaluation methods primarily assess the correctness of the final
answer, which cannot guarantee that the edited knowledge has been correctly integrated
into the reasoning process. To address these limitations, we propose a novel evaluation
framework to systematically examine how edited knowledge is integrated within a multi-hop
reasoning process. We introduce three types of entity-level errors: (i) Entity Persistence,
where outdated entities remain; (ii) Entity Mismatch, where unrelated entities appear;
and (iii) Entity Distortion, where entities are morphologically distorted, such as
misspellings or truncations. Our analysis reveals that these errors frequently occur
even when the final answer is correct. Moreover, when the final answer is incorrect,
Entity Mismatch Errors are commonly observed, indicating unintended side effects of
knowledge editing. The code is available at https://github.com/KUNLP/multihop-edit-eval.
M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
- Guangyin Jin
- Sicong Lai
- Xiaoshuai Hao
- Jinlei Zhang
- Mingtao Zhang
Achieving accurate traffic prediction is a fundamental but crucial task in the development
of current intelligent transportation systems. These limitations pose significant
challenges for the efficient deployment and operation of deep learning models on large-scale
datasets. To address these challenges, we propose a cost-effective graph-free Multilayer
Perceptron (MLP) based model M3-Net for traffic prediction. Extensive experiments
conducted on multiple real datasets demonstrate the superiority of the proposed model
in terms of prediction performance and lightweight deployment. Our code is available
at https://github.com/jinguangyin/M3_NET
Imputing Multi-Agent Trajectories from Event and Snapshot Data in Soccer
- Geonhee Jo
- Miru Hong
- Han-Jun Choi
- Minho Lee
- Pascal Bauer
- Sang-Ki Ko
Recent advances in wearable sensors and computer vision technologies have enabled
the collection of tracking data in team sports, which has become a core resource for
fine-grained analysis. However, the availability of tracking data remains constrained
by high acquisition costs and technical limitations. Compared to tracking data, event
data recording on-ball actions and snapshot data providing partial player positions
are more widely accessible. To this end, this study proposes a novel approach to predict
the positions of all players at each event timestamp in soccer matches, leveraging
the limited information available from event and snapshot data. We propose an event-based
imputation model that integrates spatial and temporal attention to capture the spatiotemporal
multi-agent structure. In experiments, we evaluate our model on 13 soccer matches,
achieving average position errors of 5.84 m. To assess the practical utility of our
approach, we apply it to a downstream task called Pitch Control, which requires full
tracking data. These results highlight the potential of event-based position imputation
to expand access to fine-grained analysis in data-constrained settings.
CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space
Features of Contextual Co-occurrence Tensors
- Sri Durga Sai Sowmya Kadali
- Evangelos Papalexakis
The widespread use of Large Language Models (LLMs) in many applications marks a significant
advance in research and practice. However, their complexity and hard-to-understand
nature make them vulnerable to attacks, especially jailbreaks designed to produce
harmful responses. To counter these threats, developing strong detection methods is
essential for the safe and reliable use of LLMs. This paper studies this detection
problem using the Contextual Co-occurrence Matrix, a structure recognized for its
efficacy in data-scarce environments. We propose a novel method leveraging the latent
space characteristics of Contextual Co-occurrence Matrices and Tensors for the effective
identification of adversarial and jailbreak prompts. Our evaluations show that this
approach achieves a notable F1 score of 0.83 using only 0.5% of labeled prompts, which
is a 96.6% improvement over baselines. This result highlights the strength of our
learned patterns, especially when labeled data is scarce. Our method is also significantly
faster, speedup ranging from 2.3 to 128.4 times compared to the baseline models.
Sarcasm Subtype-Specific Reasoning in Dialogue with Multimodal Cues Using Large Language
Models
- Choongwon Kang
- Wonbyung Lee
- Seunghyun Hwang
- Sunho Tae
- Seungjong Sun
- Jang Hyun Kim
Sarcasm is a nuanced form of human communication characterized by a mismatch between
an utterance and the speaker's intent or contextual cues. Recent studies have aimed
to advance the understanding of sarcasm by developing systems capable of generating
rationales behind sarcastic expressions. In particular, multimodal cues such as facial
expressions and vocal tone have been crucial indicators when semantic incongruity
with the utterance is prominent. However, existing multimodal sarcasm reasoning approaches
fall short of providing fine-grained explanations. Sarcasm can be further categorized
into specific subtypes based on the forms of inversion it employs, such as nonverbal
cues, dialogue context, and exaggerated word emphasis. To address this, we introduce
a novel task called Sarcasm Subtype-specific Reasoning Generation (SSRG). To facilitate
research on this task, we present the Sarcasm Subtype-specific Reasoning Dataset (SSRD),
which establishes a new benchmark for fine-grained sarcasm reasoning. Through extensive
experiments, we demonstrate that leveraging multimodal cues significantly enhances
subtype-specific sarcasm reasoning. Moreover, we show that integrating these multimodal
cues into textual representations enables strong performance even when using only
large language models (LLMs).
When Language Shapes Thought: Cross-Lingual Transfer of Factual Knowledge in Question
Answering
Multilingual large language models (LLMs) offer promising opportunities for cross-lingual
information access, yet their use of factual knowledge remains highly sensitive to
the input language. Prior work has addressed this through English prompting and evaluation,
assuming that English-based reasoning is universally beneficial. In this work, we
challenge that assumption by exploring factual knowledge transfer from non-English
to English through the lens of Language and Thought Theory. We introduce Language-to-Thought
(L2T) prompting, which aligns the model's internal ''thinking'' language with the
source of knowledge. Across three languages and four models, L2T consistently outperforms
English-based reasoning, reversing the expected advantage of English prompts. Our
code is available at https://github.com/GeomeunByeol/Language2Thought.
Jailbreaking LLMs Through Cross-Cultural Prompts
- Damin Kim
- Minseok Hur
- Jeongin Lee
- Moohong Min
We examine how linguistic and cultural framing affect jailbreak success in three commercial
LLMs (GPT-4, Claude 3, Gemini), using semantically equivalent prompts in direct, indirect, and metaphorical styles
across four high-resource languages. Indirect prompts most effectively bypassed filters,
with framing and style significantly influencing alignment. GPT-4 was especially vulnerable to indirect framing, Claude 3 remained consistently robust, and Gemini showed high sensitivity to cultural and linguistic variation. Our findings highlight
the need for alignment strategies resilient to diverse expression styles and cultural
contexts.
When User Engagement Meets Structural Cohesiveness: A Decay-Driven Approach to Hypergraph
Cores
- Hyewon Kim
- Minseok Kim
- Dahee Kim
- Junghoon Kim
Cohesive subgraph discovery in hypergraphs is essential for analysing complex group
interactions in various domains such as e-commerce, social media, and collaboration
networks. However, existing models are vulnerable to large hyperedges that artificially
inflate connectivity, obscuring meaningful structure. We propose the (k,s)-core, a
new model requiring each node to have at least k neighbours with a minimum interaction
strength s, measured via a size-sensitive decay function. This penalises noisy co-occurrences
while preserving strong local patterns. We develop an efficient algorithm with theoretical
guarantees, and experiments on real-world datasets demonstrate improved compactness
and robustness over prior methods.
CR-SGCN: Unsupervised Signed Community Detection via Conductance Regularization
Community detection in signed networks is challenging due to the presence of both
positive and negative edges, which violate the homophily assumption commonly used
in traditional methods. In this paper, we present CR-SGCN, an unsupervised framework
for community detection in signed networks. It combines a signed GCN encoder, a soft
community assignment layer, and a degree-corrected stochastic block model decoder.
To enhance boundary separation, we introduce an edge-level signed conductance regularization
that pulls intra-community embeddings closer and pushes inter-community ones apart.
Without requiring labels, CR-SGCN effectively captures community structure even under
edge sparsity. Experiments on real-world signed networks show consistent gains in
signed modularity and structural separation over existing baselines. The results demonstrate
the robustness and effectiveness of CR-SGCN for unsupervised signed community detection.
Upcycling Candidate Tokens of Large Language Models for Query Expansion
- Jinseok Kim
- Sukmin Cho
- Soyeong Jeong
- Sangyeop Kim
- Sungzoon Cho
Query Expansion (QE) improves retrieval performance by enriching queries with related
terms. Recently, Large Language Models (LLMs) have been used for QE, but existing
methods face a trade-off: generating diverse terms boosts performance but increases
computational cost. To address this challenge, we propose Candidate Token Query Expansion
(CTQE), which extracts diverse and relevant terms from a single LLM decoding pass
by leveraging unselected candidate tokens. These tokens, though not part of the final
output, are conditioned on the full query and capture useful information. By aggregating
them, CTQE achieves both relevance and diversity without extra inference, reducing
overhead and latency. Experiments show that CTQE delivers strong retrieval performance
with significantly lower cost, outperforming or comparable to more expensive methods.
Code is available at: https://github.com/bluejeans8/CTQE
RadialFocus: Geometric Graph Transformers via Distance-Modulated Attention
- San Kim
- Seungjun Lee
- Sichan Oh
- Jaekwang Kim
Graph Transformers (GTs) excel at long-range reasoning on graphs but often rely on
costly positional encodings or auxiliary virtual nodes to perceive geometry. We present
the RadialFocus Graph Transformer (RadialFocus), a geometry-aware GT that learns to modulate attention with a lightweight, distance-selective kernel. Each head is equipped with
a differentiable radial basis function whose centre μ and width σ are trained end-to-end,
boosting attention between nodes that lie inside its adaptive ''focus'' while gently
suppressing others. Injecting the logarithm of this kernel into the pre-softmax logits
preserves the stability and permutation invariance of standard self-attention, incurs
negligible memory overhead, and removes the need for hand-crafted 3-D encodings or
virtual nodes. On 3-D molecular benchmarks RadialFocus attains a validation MAE of
46.3, meV on PCQM4Mv2 with only 13 M parameters, surpassing models an order of magnitude
larger. It also sets a new best average ROC-AUC (79.1 %) on MoleculeNet and reaches
0.957 MAE on PDBBind2020, a new high-water mark for binding-affinity prediction. The
same architecture transfers to 2-D graphs, achieving 97.8 % accuracy on MNIST-Superpixel.
Ablation studies indicate that the learned (μ, σ) capture task-relevant distance scales
and that log-space fusion stabilises gradients. These findings suggest that a simple,
learned distance modulation suffices to equip Transformers with strong geometric priors,
enabling accurate and parameter-efficient reasoning across diverse graph domains.
Learning Short-Term and Long-Term Patterns of High-Order Dynamics in Real-World Networks
- Yunyong Ko
- Da Eun Lee
- Song Kyung Yu
- Sang-Wook Kim
Real-world networks have high-order relationships among objects and they evolve over
time. To capture such dynamics, many works have been studied in a range of fields.
Via an in-depth preliminary analysis, we observe two important characteristics of
high-order dynamics in real-world networks: high-order relations tend to (O1) have
a structural and temporal influence on other relations in a short term and (O2) periodically
re-appear in a long term. In this paper, we propose LINCOLN, a method for Learning
hIgh-order dyNamiCs Of reaL-world Networks, that employs (1) bi-interactional hyperedge
encoding for short-term patterns, (2) periodic time injection and (3) intermediate
node representation for long-term patterns. Via extensive experiments, we show that
LINCOLN outperforms nine state-of-the-art methods in the dynamic hyperedge prediction
task.
Side Information Memory Network: Expanding the Breadth of User Behavior Sequences
in Recommendation
- Zhoufan Kong
- Fan Zhang
- Qijie Shen
- Junyan Qiu
Research on sequence-based ranking models has been a popular field in recommendation
systems. In recent years, numerous researchers have devoted themselves to expanding
the content of user behavior sequences, such as longer sequences, more types of sequences,
and more variable sequence periods. These studies have achieved promising results,
especially in the direction of longer sequences, where a large number of industrial
recommendation systems have demonstrated that longer sequences can lead to better
performance. However, as the sequence length approaches the upper limit of user behavior
occurrences, the marginal benefit of increasing sequence length is gradually diminishing.
Against this backdrop, this paper proposes a sequence expansion framework based on
Side Information Memory Network (SIMN). Based on SIMN, theoretically all item-side features can be incorporated
into the sequence, while avoiding additional sample development costs and storage
costs. Furthermore, considering the application of this framework in small to medium-sized
recommendation systems, this paper proposes a Feature Auto Encoder-Decoder (FAED) module, which further reduces the storage cost of SIMN. This paper integrates
SIMN and FAED into a unified multitask training framework for modeling and validates
it on two industrial datasets. Experimental results demonstrate that SIMN-FAED can
be integrated with most mainstream sequence modeling methods and achieve better performance,
with broad application prospects.
On Evaluating Loss Functions for Stock Ranking: An Empirical Analysis with Transformer
Model
- Jan Kwiatkowski
- Jarosław Chudziak
Quantitative trading strategies rely on accurately ranking stocks to identify profitable
investments. Effective portfolio management requires models that can reliably order
future stock returns. Transformer models are promising for understanding financial
time series, but how different training loss functions affect their ability to rank
stocks well is not yet fully understood. Financial markets are challenging due to
their changing nature and complex relationships between stocks. Standard loss functions,
which aim for simple prediction accuracy, often aren't enough. They don't directly
teach models to learn the correct order of stock returns. While many advanced ranking
losses exist from fields such as information retrieval, there hasn't been a thorough
comparison to see how well they work for ranking financial returns, especially when
used with modern Transformer models for stock selection. This paper addresses this
gap by systematically evaluating a diverse set of advanced loss functions including
pointwise, pairwise, listwise for daily stock return forecasting to facilitate rank-based
portfolio selection on S&P 500 data. We focus on assessing how each loss function
influences the model's ability to discern profitable relative orderings among assets.
Our research contributes a comprehensive benchmark revealing how different loss functions
impact a model's ability to learn cross-sectional and temporal patterns crucial for
portfolio selection, thereby offering practical guidance for optimizing ranking-based
trading strategies.
Mitigating Knowledge Degradation Caused by Knowledge Editing on Identical Subjects
through Two-Step Editing
- Seonghee Lee
- Geon Park
- Geunyeong Jeong
- Juoh Sun
- Harksoo Kim
Large Language Models (LLMs) acquire extensive factual knowledge from large-scale
datasets and demonstrate remarkable performance across various tasks. However, since
real-world knowledge is constantly changing, it is necessary to modify or expand the
model's knowledge. To achieve this, knowledge editing techniques are employed to correct
inaccurate or outdated information and inject new knowledge, thereby ensuring that
the model remains current. However, in existing subject-centered editing approaches,
repeatedly editing the same subject can lead to knowledge degradation, where previously
edited knowledge is forgotten. In this paper, we analyze the causes of this knowledge
degradation phenomenon and propose a two-step editing method that independently edits
subjects and relations to mitigate this issue. Our method effectively alleviates knowledge
degradation compared to existing knowledge editing techniques, achieving average performance
improvements of 22.9% in multi-edit scenarios and 7.2% in sequential editing.
Spectral Edge Encoding - SEE: Does Structural Information Really Enhance Graph Transformer
Performance?
- Seungjun Lee
- San Kim
- Johyeon Kim
- Jaekwang Kim
We propose Spectral Edge Encoding (SEE), a parameter-free framework that quantifies
each edge's contribution to the global structure by measuring spectral shifts in the
Laplacian eigenvalues. SEE captures the low-frequency sensitivity of edges and integrates
these scores into graph Transformer attention logits as a structure-aware bias. When
applied to the Moiré Graph Transformer (MoiréGT) and evaluated on seven MoleculeNet
classification benchmarks, SEE consistently improves ROC-AUC performance. In particular,
MoiréGT+SEE achieves an average ROC-AUC of 85.3%, approximately 7.1 percentage points
higher than the previous state-of-the-art model UniCorn (78.2%). Moreover, SEE preserves
molecular topology and enables edge-level interpretability, offering a practical alternative
to sequence-based chemical language models. These results demonstrate that spectrum-informed
attention can simultaneously enhance performance and transparency in graph-based molecular
modeling.
Deceptive Synthetic Updates: Stealth Free-Rider Attack on Model Aggregation in Federated
Learning
- Youngjoon Lee
- Jinu Gong
- Joonhyuk Kang
Federated Learning (FL) allows multiple clients to collaboratively train shared models
without exchanging raw data, thereby preserving privacy. However, FL systems are vulnerable
to malicious participants known as free-riders who exploit the collaborative nature
without providing genuine data contributions. To expose this critical security threat,
we introduce a novel stealth free-rider attack that leverages pre-trained forecasting
models to generate highly realistic synthetic time-series data. Our approach enables
malicious clients to deceive FL systems while obtaining benefits from fair participants'
contributions, thereby undermining the integrity of federated networks. Numerical
results on EEG-based sleep stage classification demonstrate that our attack maintains
comparable performance with free-rider ratios up to 70% while causing catastrophic
degradation when all clients are free-riders.
EI-KGC: A Knowledge Graph Completion Model Based on Fine-Grained Element Interactions
- Dong Li
- Lingling Zhang
- Yuhang Fan
- Jingyou Sun
- Xinyu Zhang
- Baoyan Song
Most existing knowledge graph completion methods fail to model the fine-grained interactions
among elements within triples, such as dependencies between entity attributes or contextual
relationships involving predicates and entities. This limitation weakens their ability
to infer implicit knowledge and hinders overall reasoning performance. To address
this issue, we define a three-level classification of element interactions: Interactions
between Elements at the Head Entity (IEH), Interactions between Elements at the Relationship
(IER), and Interactions between Elements at the Tail Entity (IET), that systematically
models the influence propagation patterns among knowledge graph triples at element
level. Based on these interaction types, we propose a novel Knowledge Graph Completion
Model Based on Fine-Grained Element Interactions (EI-KGC). Our model captures both
global structural patterns and semantic dependencies within triples by combining GNN
propagation with fine-grained interaction modeling. Experimental results show that
the EI-KGC consistently outperforms traditional baseline models, demonstrating the
high effectiveness of our proposed model.
Improving Content Anomaly Detection on Social Media via Counterfactual Mitigation
of Social Event-Induced Bias
Content anomaly detection on social media (SNS) plays a critical role in maintaining
healthy online communities. However, the prevalence of major real-world social events
can introduce model bias that skews, narrows, and undermines a detector's ability
to perceive anomaly in content by contaminating and homogenizing the expressions posted
during these events. To address this challenge, we propose SeiNS, a novel, model-agnostic
plugin designed to mitigate the bias induced by prevalent social events. SeiNS comprises
two components: a Social Event Extractor and an Event Involvement Perceptor. For a
given post, SeiNS removes the bias through steps that include, treating the correlation
between the label and the social event-induced homogenized content, as well as the
correlation between the label and the event-involvement representation, as spurious;
performing counterfactual learning on them during finetuning. We evaluate SeiNS on
two benchmark datasets composed of real SNS posts under various major social events,
where content anomaly respectively plays a negative and a positive role for social
good. The results show that SeiNS significantly improves the robustness of SNS content
anomaly detectors in dynamic-even unstable-online social environments.
Dynamic Reserve Price Design with Distributed Solving Algorithm
Unexpected advertising items in sponsored search may reduce users' reliance on organic
search, resulting in hidden cost for the e-commerce platform. To address this problem
and promote sustainable growth, we propose a dynamic reserve price design that incorporates
the hidden cost into the auction mechanism to determine whether to sell the traffic,
thereby ensuring a balanced relationship between revenue and user experience. Our
dynamic reserve price design framework optimizes traffic sales by minimizing impacts
on user experience while maintaining long-term incentives for advertisers to reveal
their valuations truthfully. Furthermore, we introduce a distributed algorithm capable
of computing reserve prices with billion-scale data in the production environment.
Experiments involving offline evaluations and online A/B testing demonstrate that
this method is simple and efficient, making it suitable for use in industrial production.
This method has already been fully deployed in the production environment.
Improving Rare and Common ICD Coding via a Multi-Agent LLM-Based Approach
- Rumeng Li
- Xun Wang
- Hong Yu
Large Language Models (LLMs) have shown strong performance in tasks such as zero-
and few-shot information extraction from clinical text without domain-specific training.
However, in the ICD coding task, LLMs often hallucinate key details and produce high-recall
but low-precision outputs due to the high-dimensional and imbalanced nature of ICD
code distributions. Existing LLM-based approaches typically fail to capture the complex,
dynamic interactions among human agents involved in real-world coding workflows-such
as patients, physicians, and coders-and often lack interpretability and reliability.
To address these challenges, we propose a novel multi-agent framework for ICD coding
that simulates the real-world process using five role-specific LLM agents-patient,
physician, coder, reviewer, and adjuster-and integrates the Subjective, Objective,
Assessment, and Plan (SOAP) structure from Electronic Health Records to enhance performance.
Evaluated on the MIMIC-III dataset, our method significantly outperforms zero-shot
Chain-of-Thought prompting, self-consistency strategies, and LLM-designed agent baselines,
particularly for rare codes. Ablation studies confirm the contribution of each agent
role, and the system achieves competitive performance with state-of-the-art fine-tuned
models, while offering better explainability and requiring no task-specific pre-training.
Position-Agnostic Probabilistic Generation for Robust Steganographic Text
- Shuo Lin
- Zhenyang Shen
- Xiang Li
- Jiacheng Fan
- Sixing Wu
With the rise of linguistic steganography, the robustness to withstand minor perturbations
like textual edits or tokenization shifts remains a critical yet crucial challenge.
To address this, we propose a novel robustness-enhancing method, Position-Agnostic Probabilistic Generation, which combines semantic clustering with probabilistic generation control. During
decoding, the model is softly guided to prefer or avoid specific token sets through
time-aware probability boosting, enabling robust bit embedding without relying on
fixed token positions. These token sets are constructed from semantically coherent
clusters derived from the language model's vocabulary and expanded to ensure fluency.
A dynamic, time-aware boosting strategy is then applied to gradually amplify the likelihood
of valid tokens throughout the generation. Experimental results demonstrate that our
method consistently outperforms baselines in preserving hidden information under 10
types of perturbations.
Token-Fusion: A Sparse Expert Routing Method for Multi-task Data Matching
Multi-task data matching-including entity matching, entity linking, and schema matching-is
a fundamental task in data integration, yet remains challenging due to heterogeneous
inputs and task-specific model designs. We propose Token-Fusion, a unified sparse
expert method that integrates token-level dynamic expert routing, adaptive expert
pool management, and a fusion strategy guided by both confidence and performance gain.
Meanwhile, regularization losses are designed to encourage sparse and diverse expert
activation for improved efficiency. Our extensive experimental evaluation on six public
datasets demonstrates the effectiveness and efficiency of Token-Fusion in handling
heterogeneous matching tasks, establishing it as a promising solution for unified
and scalable multi-task data matching.
Few-Shot Knowledge Graph Completion via Transfer Knowledge from Similar Tasks
- Lihui Liu
- Zihao Wang
- Dawei Zhou
- Ruijie Wang
- Yuchen Yan
- Bo Xiong
- Sihong He
- Hanghang Tong
Knowledge graphs (KGs) are essential in many AI applications but often suffer from
incompleteness, limiting their utility. Many relations in KGs have only a few examples,
making it challenging to train accurate models. Few-shot learning offers a promising
direction by enabling KG completion with only a small number of training triplets.
However, most existing approaches treat each relation independently and fail to leverage
shared information across tasks. In this paper, we introduce TransNet, a transfer learning method for few-shot KG completion that captures task relationships
and reuses knowledge from related tasks. TransNet further incorporates meta-learning to effectively handle unseen relations. Experiments
on standard benchmarks demonstrate that TransNet achieves strong performance compared to prior methods. Code and data will be released
upon acceptance.
Monte Carlo Tree Search for Graph Reasoning in Large Language Model Agents
While large language models (LLMs) have achieved impressive results across many tasks,
they remain prone to hallucinations, particularly in domains requiring substantial
background knowledge. A common way to mitigate this issue is to incorporate external
knowledge, often through retrieval-augmented generation (RAG). However, most existing
RAG approaches focus solely on textual data and neglect an important aspect: the connections
between pieces of knowledge. In domains such as scientific publishing, entities like
papers, authors, and citations form rich graphs, where meaning emerges not only from
individual texts but also from their relationships. To address this, we propose Graph-MCTS,
a framework that enhances LLM reasoning by leveraging graph structures. Graph-MCTS
uses Monte Carlo Tree Search (MCTS) to guide the model through structured exploration
of graph-based knowledge. We evaluate Graph-MCTS across multiple LLM architectures
and find that it consistently outperforms existing augmentation methods. These findings
highlight the importance of structured, relational knowledge for improving the reasoning
capabilities of LLMs. Code is available at https://github.com/lihuiliullh/Graph-MCTS
Powering Job Search at Scale: LLM-Enhanced Query Understanding in Job Matching Systems
- Ping Liu
- Jianqiang Shen
- Qianqi Shen
- Chunnan Yao
- Kevin Kao
- Dan Xu
- Rajat Arora
- Baofen Zheng
- Caleb Johnson
- Liangjie Hong
- Jingwei Wu
- Wenjing Zhang
Query understanding is essential in modern relevance systems, where user queries are
often short, ambiguous, and highly context-dependent. Traditional approaches often
rely on multiple task-specific Named Entity Recognition models to extract structured
facets as seen in job search applications. However, this fragmented architecture is
brittle, expensive to maintain, and slow to adapt to evolving taxonomies and language
patterns. In this paper, we introduce a unified query understanding framework powered
by a Large Language Model (LLM), designed to address these limitations. Our approach
jointly models the user query and contextual signals such as profile attributes to
generate structured interpretations that drive more accurate and personalized recommendations.
The framework improves relevance quality in online A/B testing while significantly
reducing system complexity and operational overhead. The results demonstrate that
our solution provides a scalable and adaptable foundation for query understanding
in dynamic web applications.
Sequential Difference Maximization: Generating Adversarial Examples via Multi-Stage
Optimization
- Xinlei Liu
- Tao Hu
- Peng Yi
- Weitao Han
- Jichao Xie
- Baolin Li
Efficient adversarial attack methods are critical for assessing the robustness of
computer vision models. In this paper, we reconstruct the optimization objective for
generating adversarial examples as ''maximizing the difference between the non-true
labels' probability upper bound and the true label's probability,'' and propose a
gradient-based attack method termed Sequential Difference Maximization (SDM). SDM establishes a three-layer optimization framework of ''cycle-stage-step.'' The
processes between cycles and between iterative steps are respectively identical, while
optimization stages differ in terms of loss functions: in the initial stage, the negative
probability of the true label is used as the loss function to compress the solution
space; in subsequent stages, we introduce the Directional Probability Difference Ratio (DPDR) loss function to gradually increase the non-true labels' probability upper bound
by compressing the irrelevant labels' probabilities. Experiments demonstrate that
compared with previous SOTA methods, SDM not only exhibits stronger attack performance
but also achieves higher attack cost-effectiveness. Additionally, SDM can be combined
with adversarial training methods to enhance their defensive effects. The code is
available at https://github.com/X-L-Liu/SDM.
Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot
Dense Retrieval
- Yuxiang Liu
- Tian Wang
- Gourab Kundu
- Tianyu Cao
- Guang Cheng
- Zhen Ge
- Jianshu Chen
- Qingjun Cui
- Trishul Chilimbi
Transformer-based models such as BERT and E5 have significantly advanced text embedding
by capturing rich contextual representations. However, many complex real-world queries
require sophisticated reasoning to retrieve relevant documents beyond surface-level
lexical matching, where encoder-only retrievers often fall short. Decoder-only large
language models (LLMs), known for their strong reasoning capabilities, offer a promising
alternative. Despite this potential, existing LLM-based embedding methods primarily
focus on contextual representation and do not fully exploit the reasoning strength
of LLMs. To bridge this gap, we propose Reasoning-Infused Text Embedding (RITE), a simple but effective approach that integrates logical reasoning into the text
embedding process using generative LLMs. RITE builds upon existing language model
embedding techniques by generating intermediate reasoning texts in the token space
before computing embeddings, thereby enriching representations with inferential depth.
Experimental results on BRIGHT, a reasoning-intensive retrieval benchmark, demonstrate that RITE significantly enhances
zero-shot retrieval performance across diverse domains, underscoring the effectiveness
of incorporating reasoning into the embedding process.
CoinCLIP: A Multimodal Framework for Assessing Viability in Web3 Memecoins
- Hou-Wan Long
- Hongyang Li
- Wei Cai
The rapid growth of memecoins within the Web3 ecosystem, driven by platforms like
Pump.fun, has made it easier for anyone to create tokens. However, this democratization
has also led to an explosion of low-quality or bot-generated projects, often motivated
by short-term financial gain. This overwhelming influx of speculative tokens creates
a challenge in distinguishing viable memecoins from those that are unlikely to succeed.
To address this issue, we introduce CoinVibe, a comprehensive multimodal dataset designed
to evaluate the viability of memecoins. CoinVibe integrates textual descriptions,
visual content (logos), and community data (user comments, timestamps, and number
of likes) to provide a holistic view of a memecoin's potential. In addition, we present
CoinCLIP, a novel framework that leverages the Contrastive Language-Image Pre-Training
(CLIP) model, augmented with lightweight modules and community data integration, to
improve classification accuracy. By combining visual and textual representations with
community insights, CoinCLIP provides a robust, data-driven approach to filter out
low-quality or bot-driven projects. This research aims to help creators and investors
identify high-potential memecoins, while also offering valuable insights into the
factors that contribute to their long-term success.
KP-Agent: Keyword Pruning in Sponsored Search Advertising via LLM-Powered Contextual
Bandits
- Hou-Wan Long
- Yicheng Song
- Zidong Wang
- Tianshu Sun
Sponsored search advertising (SSA) requires advertisers to constantly adjust keyword
strategies. While bid adjustment and keyword generation are well-studied, keyword
pruning-refining keyword sets to enhance campaign performance-remains underexplored.
This paper addresses critical inefficiencies in current practices as evidenced by
a dataset containing 0.5 million SSA records from a pharmaceutical advertiser on search
engine Meituan, China's largest delivery platform. We propose KP-Agent, an LLM agentic
system with domain tool set and a memory module. By modeling keyword pruning within
a contextual bandit framework, KP-Agent generates code snippets to refine keyword
sets through reinforcement learning. Experiments show KP-Agent improves cumulative
profit by up to 49.28% over baselines.
ORCA: Mitigating Over-Reliance for Multi-Task Dwell Time Prediction with Causal Decoupling
- Huishi Luo
- Fuzhen Zhuang
- Yongchun Zhu
- Yiqing Wu
- Bo Kang
- Ruobing Xie
- Feng Xia
- Deqing Wang
- Jin Dong
Dwell time (DT) is a critical post-click metric for evaluating user preference in
recommender systems, complementing the traditional click-through rate (CTR). Although
multi-task learning is widely adopted to jointly optimize DT and CTR, we observe that
multi-task models systematically collapse their DT predictions to the shortest and
longest bins, under-predicting the moderate durations. We attribute this moderate-duration bin under-representation to over-reliance on the CTR-DT spurious correlation, and propose ORCA to address
it with causal-decoupling. Specifically, ORCA explicitly models and subtracts CTR's
negative transfer while preserving its positive transfer. We further introduce (i)
feature-level counterfactual intervention, and (ii) a task-interaction module with
instance inverse-weighting, weakening CTR-mediated effect and restoring direct DT
semantics. ORCA is model-agnostic and easy to deploy. Experiments show an average
10.6% lift in DT metrics without harming CTR. Code is available at https://github.com/Chrissie-Law/ORCA-Mitigating-Over-Reliance-for-Multi-Task-Dwell-Time-Prediction-with-Causal-Decoupling.
Anchor-based Pairwise Comparison via Large Language Model for Recommendation Reranking
- Qin Luo
- Erjia Chen
- Zhao Shi
- Bang Wang
In recommender systems, reranking is an important post-processing technique to reorder
the items in a recommendation list. Recently, some LLM-based reranking approaches
have been proposed to enjoy the semantic reasoning capability of a large language
model. However, they are sensitive to the order of the input list and often incur
large computational overheads. To address their limitations, we propose the APCR, an Anchor-based Pairwise Comparison for recommendation Reranking in this paper. It first leverages an LLM to conduct pairwise comparisons
between those recommended items and an anchor and computes a kind of preference scores
for producing a new LLM suggested list. We next propose a position-aware list reranking
technique to reorder the items in the recommendation list by considering their positions
in the LLM suggested list to output the final list. Experiments on real-world datasets
show that our APCR outperforms the state-of-the-art LLM-based reranking techniques
in terms of better list ranking performance.
Bridging the Gap between Knowledge Graphs and LLMs for Multi-hop Question Answering
- Shijie Luo
- Xinyuan Lu
- Qinpei Zhao
- Weixiong Rao
To achieve multi-hop question answering over knowledge graphs (KGQA), many studies
have explored converting retrieved subgraphs into textual form and feeding them into
large language models (LLMs) to leverage their reasoning capabilities. However, due
to the linear and discrete nature of text sequences, model performance may degrade
when handling complex questions. To this end, we propose a novel structure-text knowledge
synergistic method, BrikQA, which bridges the knowledge gap between knowledge graphs (KGs) and LLMs for multi-hop KGQA. LLMs and KGs complement each other by leveraging explicit topological patterns and
implicit knowledge mining to enhance knowledge understanding and address sparsity
issues. Experimental results on various datasets demonstrate that BrikQA outperforms
state-of-the-art baselines. Our source code is available at https://github.com/shijielaw/BrikQA.
From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media
- Tian Ma
- Kaiyu Feng
- Yu Rong
- Kangfei Zhao
Personality prediction from social media posts is a critical task that implies diverse
applications in psychology and sociology. The Myers-Briggs Type Indicator (MBTI),
a popular personality inventory, has been traditionally predicted by machine learning
(ML) and deep learning (DL) techniques. Recently, the success of Large Language Models
(LLMs) has revealed their huge potential in understanding and inferring personality
traits from social media content. However, directly exploiting LLMs for MBTI prediction
faces two key challenges: the hallucination problem inherent in LLMs and the naturally
imbalanced distribution of MBTI types in the population. In this paper, we propose
PostToPersonality (P2P), a novel LLM- based framework for MBTI prediction from social
media posts of individuals. Specifically, P2P leverages Retrieval-Augmented Generation
with in-context learning to mitigate hallucination in LLMs. Furthermore, we fine-tune
a pre-trained LLM to improve model specification in MBTI understanding with synthetic
minority oversampling, which balances the class imbalance by generating synthetic
samples. Experiments conducted on a real-world social media dataset demonstrate that
P2P achieves state-of-the-art performance compared with 10 ML/DL baselines.
Image Hashing Based on Hamming Ball Spacing
- Zhaomeng Ma
- Haoran Chang
- Hailong Shen
- Yanzhi Song
Image hashing has been widely used in large-scale image retrieval due to its enhancement
of storage space and retrieval speed. Recently, methods based on hash centers have
achieved impressive retrieval performance, aiming to assign mutually separated hash
codes as center points for each category and learn compact binary codes by minimizing
intra-class variance. However, current methods tend to generate codebooks in advance,
which are then used as pre-defined hash centers, leading to compromised quality of
hash centers in instance-level datasets with excessive categories. To address this
problem, we designed a training strategy that learns hash centers by constraining
the margin of category clusters in Hamming space, during which the Gilbert-Varshamov
bound and the upper bound of Hamming ball packing from coding theory are utilized
to determine the range of margins. Finally, we jointly optimize hash encoder and hash
centers to improve retrieval performance. Extensive experiments on more challenging
large-scale instance-level datasets demonstrate that our method effectively overcomes
the limitation of retrieval performance imposed by the number of categories. Code
is at: https://github.com/GrimmAI/Hamming-Ball-Hashing.git.
kNNBE: Incorporating Labeled Sentences in Bi-encoder Inference for Fast and Accurate
Skill Mapping
Skill mapping is a key task in the Human Resources domain. It consists in identifying
ontology-defined skills in job texts. Among the most successful approaches applied
to skill mapping, bi-encoders offer efficient inference but struggle with fine-grained
skill distinctions, particularly under limited supervision. While accurate, cross-encoder
and LLM-based reranking approaches are computationally expensive and usually not feasible
to be adopted in real case scenarios. We propose kNNBE, a hybrid inference method
that augments bi-encoder similarity scores with k-nearest labeled sentences drawn
from a synthetic memory bank. kNNBE improves both prediction accuracy and generalization
to unseen skills while retaining high throughput. Extensive experiments on three benchmark
datasets show that kNNBE rivals state-of-the-art rerankers in accuracy while being
orders of magnitude faster.
Leveraging Intra-Modal Consistency for Cross-Modal Alignment and Retrieval
- Fengyang Mao
- Xiaodong Yue
- Yufei Chen
- Jiaqing Ma
- Zheran Zhang
- Jie Shi
Cross-modal retrieval aims to match videos and texts by mapping them into a shared
feature space. Most existing approaches achieve alignment through contrastive learning
based on one-to-one supervised pairs. However, these methods rely too much on supervised
signals and do not fully use the unsupervised semantic relationships within each modality.
As a result, samples that are semantically similar may be spread in the shared space,
which hurts retrieval performance. To solve this problem, we propose a method called
Leveraging Intra-Modal Consistency for Cross-Modal Alignment and Retrieval (LICA). Our method introduces a consistency constraint between intra-modal similarities
and cross-modal similarity distributions. In this way, samples that are close in meaning
stay closer together in the shared space. Experiments on standard text-video retrieval
benchmarks show that LICA helps optimize the distribution of the cross-modal feature
space and improves retrieval accuracy.
Multi-Task Learning through Hierarchical Information Sharing and Transfer
- Yufan Mao
- Jingran Xu
- Liang Zhang
- Xiyue Hou
- Yongbo Jin
- Yingming Li
- Linjian Mo
In this work, we propose a novel Hierarchical Information Sharing and Transfer (HIST)
framework for multi-task learning, which employs implicit shared-bottom pattern and
explicit sequential transfer at tower-level simultaneously. In particular, a multi-level
gating mixture-of-experts is presented for efficient bottom-level sharing. Further,
self-attention mechanism is adopted for information transfer between task-specific
towers. Such hierarchical task interaction scheme leads to a remarkable enhancement
in multi-task learning settings. Extensive experiments on four subsets of AliExpress
dataset unequivocally demonstrate that HIST outperforms the current state-of-the-art
methods consistently.
LLM-Enhanced Linear Autoencoders for Recommendation
- Jaewan Moon
- Seongmin Park
- Jongwuk Lee
Large language models (LLMs) have been widely adopted to enrich the semantic representation
of textual item information in recommender systems. However, existing linear autoencoders
(LAEs) that incorporate textual information rely on sparse word co-occurrence patterns,
limiting their ability to capture rich textual semantics. To address this, we propose
L3AE, the first integration of LLMs into the LAE framework. L3AE effectively integrates the heterogeneous knowledge of textual semantics and user-item
interactions through a two-phase optimization strategy. (i) L3AE first constructs a semantic item-to-item correlation matrix from LLM-derived item
representations. (ii) It then learns an item-to-item weight matrix from collaborative
signals while distilling semantic item correlations as regularization. Notably, each
phase of L3AE is optimized through closed-form solutions, ensuring global optimality and computational
efficiency. Extensive experiments demonstrate that L3AE consistently outperforms state-of-the-art LLM-enhanced models on three benchmark
datasets, achieving gains of 27.6% in Recall@20 and 39.3% in NDCG@20. The source code
is available at https://github.com/jaewan7599/L3AE_CIKM2025.
Bayesian Privacy Guarantee for User History in Sequential Recommendation Using Randomised
Response
Sequential recommendation systems play an important role in delivering personalised
user experiences, yet they rely heavily on detailed user history, raising serious
privacy concerns. In this work, we introduce a novel framework that integrates a randomised
response mechanism into sequential recommendation to provide strong privacy guarantees
while preserving recommendation effectiveness. By obfuscating user history through
controlled probabilistic item substitution based on semantic similarity, our approach
ensures that released sequences protect individual behaviour with provable Bayesian
posterior privacy. We further propose training strategies tailored for privacy-filtered
data, including a frequency-based vocabulary expansion method inspired by subword
tokenisation. Experiments on four real-world datasets demonstrate that our approach
preserves recommendation quality under strong privacy constraints and outperforms
existing baselines even without applying privacy filters.
GraFS: An Integrated GNN-LLM Approach for Inferring Best Functional Substitute Products
- Favour Nerrise
- Edward W Huang
- Xiaonan Ji
- Karthik Subbian
- Danai Koutra
Identifying and ranking the most functionally substitutable products is a key challenge for improving product selection and recommendations
in e-commerce. Functionally substitutable products are items that share core functional attributes, which allow them to serve similar purposes despite variations in product
features. Traditional methods struggle with (1) accurately capturing functional similarities between products with unique attributes, and (2) ranking substitutes
that best optimize product selection and meet customer needs. We introduce GraFS (Graph-enabled Large Language Model framework for Functional Substitute Selection), which combines Large Language Models (LLMs) to extract textual,
semantic similarities from product descriptions and Graph Neural Networks (GNNs) that
learn substitution patterns from customer behavior. Specifically, LLMs generate functional similarity embeddings from unstructured, product text attributes, while GNNs aggregate
these embeddings with graph-structured customer data to predict substitution scores
and rank products within functional groups. This dual approach enables GraFS to identify the top-k most suitable functional substitutes that maximize purchase likelihood while maintaining product diversity.
Experiments on four large-scale e-commerce reviews datasets demonstrate that our framework
significantly improves upon conventional methods and better captures relationships
among functional substitute products by up to %19 NDCG@10 and Pre@10 against baselines.
Multi-modal Adaptive Mixture of Experts for Cold-start Recommendation
- Van-Khang Nguyen
- Duc-Hoang Pham
- Huy-Son Nguyen
- Cam-Van Thi Nguyen
- Hoang-Quynh Le
- Duc-Trong Le
Recommendation systems have faced significant challenges in cold-start scenarios,
where new items with a limited history of interaction need to be effectively recommended
to users. Though multimodal data (e.g., images, text, audio, etc.) offer rich information
to address this issue, existing approaches often employ simplistic integration methods
such as concatenation, average pooling, or fixed weighting schemes, which fail to
capture the complex relationships between modalities. Our study proposes a novel Mixture
of Experts framework for multimodal cold-start recommendation (MAMEX), which dynamically
leverages latent representation from different modalities. MAMEX utilizes modality-specific
expert networks and introduces a learnable gating mechanism that adaptively weights
the contribution of each modality based on its content characteristics. This approach
enables MAMEX to emphasize the most informative modalities for each item while maintaining
robustness when certain modalities are less relevant or missing. Extensive experiments
on benchmark datasets show that MAMEX outperforms state-of-the-art models with superior
accuracy and adaptability.
Solar Forecasting with Causality: A Graph-Transformer Approach to Spatiotemporal Dependencies
- Yanan Niu
- Demetri Psaltis
- Christophe Moser
- Luisa Lambertini
Accurate solar forecasting underpins effective renewable energy management. We present
SolarCAST, a causally informed model predicting future global horizontal irradiance (GHI) at
a target site using only historical GHI from site X and nearby stations S---unlike prior work that relies on sky-camera or satellite imagery requiring specialized
hardware and heavy preprocessing. To deliver high accuracy with only public sensor
data, SolarCAST models three classes of confounding factors behind X-S correlations using scalable neural components: (i) observable synchronous variables
(e.g., time of day, station identity), handled via an embedding module; (ii) latent
synchronous factors (e.g., regional weather patterns), captured by a spatio-temporal
graph neural network; and (iii) time-lagged influences (e.g., cloud movement across
stations), modeled with a gated transformer that learns temporal shifts. It outperforms
leading time-series and multimodal baselines across diverse geographical conditions,
and achieves a 25.9% error reduction over the top commercial forecaster, Solcast. SolarCAST offers a lightweight, practical, and generalizable solution for localized solar forecasting..
Code available at https://github.com/YananNiu/SolarCAST
Externalizing Social-Cognitive Structures for User Modeling: Toward Theory-Driven
Profiling with LLMs
- Taehyung Noh
- Seungwan Jin
- Haein Yeo
- Kyungsik Han
In this paper, we propose TRIPLE (TPB-dRIven Profiling with LLM rEfinement), a dynamic
profiling framework that incorporates the Theory of Planned Behavior (TPB) into user
profile modeling. Our method (1) extracts TPB components from historical text data
to construct an initial user profile, (2) iteratively refines this profile by analyzing
discrepancies between predicted and actual behaviors, and (3) continuously updates
the user's state by incorporating newly arriving text. We evaluate TRIPLE on the LaMP
datasets, focusing on rating prediction and personalized tweet paraphrasing tasks,
using multiple open-source large language models. Experimental results demonstrate
that TRIPLE consistently outperforms existing profiling methods across all evaluation
settings. Qualitative analysis confirms that TRIPLE captures the psychological and
social mechanisms underlying users' product evaluation and description. These findings
provide empirical evidence that theory- driven user profiling can significantly improve
personalization performance in recommender systems and related applications. Our implementation
and examples of generated profiles are available at https://yestaehyung.github.io/cikm25-triple/.
Modeling Irregular Astronomical Time Series with Neural Stochastic Delay Differential
Equations
- Yongkyung Oh
- Seungsu Kam
- Dongyoung Lim
- Sungil Kim
Astronomical time series from large-scale surveys like LSST are often irregularly
sampled and incomplete, posing challenges for classification and anomaly detection.
We introduce a new framework based on Neural Stochastic Delay Differential Equations
(Neural SDDEs) that combines stochastic modeling with neural networks to capture delayed
temporal dynamics and handle irregular observations. Our approach integrates a delay-aware
neural architecture, a numerical solver for SDDEs, and mechanisms to robustly learn
from noisy, sparse sequences. Experiments on irregularly sampled astronomical data
demonstrate strong classification accuracy and effective detection of novel astrophysical
events, even with partial labels. This work highlights Neural SDDEs as a principled
and practical tool for time series analysis under observational constraints.
FairSplit: Mitigating Bias in Graph Neural Networks through Sensitivity-based Edge
Partitioning
- Indranil Ojha
- Kushal Bose
- Swagatam Das
Fairness in machine learning has become increasingly crucial, particularly in graph-based
models where biased representations can reinforce societal inequalities. Traditional
fairness-aware learning methods on graphs focus on graph rewiring, debiasing node
embeddings, adversarial learning, and additional fairness constraints. However, these
approaches often struggle to balance fairness and task performance. We propose a novel
edge partitioning strategy that creates two distinct subgraphs, maintaining a balance
between bias and diversity. We categorize edges as homophilic or heterophilic depending
on the sensitive attribute of the corresponding node pairs. An edge is s-homophilic if it joins two nodes with the same sensitivity value, otherwise s-heterophilic. The partition splits the input graph into two subgraphs, both containing all nodes,
one with only s-homophilic edges and the other with s-heterophilic ones. Using a Graph
Neural Network (GNN), we obtain independent node representations from both graphs,
which are then aggregated into a unified node embedding. To enforce fairness, we jointly
optimize a primary task loss and a fairness loss, ensuring predictive accuracy and
bias mitigation. We evaluate our approach on three benchmark datasets and find that
it achieves improved fairness metrics while maintaining accuracy comparable to that
of existing state-of-the-art methods.
Eliminating Sentiment Bias in Recommender Systems by Counterfactual Inference
- Le Pan
- Yuanjiang Cao
- Chengkai Huang
- Wenjie Zhang
- Lina Yao
Sentiment bias is newly discovered in Recommender Systems (RSs). Critical users and
niche items are disadvantaged by such unfair recommendations. To mitigate this bias,
we propose a novel approach by counterfactual inference, which is implemented in two
stages. Experiment results validate that our model achieves comparable performance
in rating prediction, providing better recommendations and effectively mitigating
sentiment bias. To the best of our knowledge, this is the first work to employ counterfactual
inference on sentiment bias mitigation in RSs.
Beyond Masking: Landmark-based Representation Learning and Knowledge-Distillation
for Audio-Visual Deepfake Detection
- Chan Park
- Muhammad Shahid Muneer
- Simon S. Woo
Audio-visual deepfake detection methods demonstrate strong performance on academic
datasets but fail significantly when applied to real-world. To address the shortcomings
of previous approaches, we utilize landmarks dynamic information. First, we propose
Landmark-based Distillation (LBD), motivated by I-JEPA's representation learning approach.
LBD utilizes KL-divergence to align facial landmark predictions from visual and audio
encoders, enforcing focus on geometric facial features rather than spurious background
information. Second, we introduce Multimodal Temporal Information Alignment (MTIA),
which employs contrastive learning to enhance temporal consistency between audio and
visual representations. We conduct experiments on academic datasets and web-based
deepfakes collected from diverse social media platforms, serving as real-world examples.
Our proposed landmark-guided distillation framework achieves computational efficiency
while improving multimodal video deepfake detection performance across a diverse range
of deepfakes compared to existing methods. The code is available at https://github.com/Ckck12/Beyond-Masking.
Federated Gradient Boosting for Financial Fraud Detection: An Empirical Study in the
Banking Sector
- Dae-Young Park
- In-Young Ko
- Taek-Ho Lee
- Junghye Lee
The development of effective fraud detection systems (FDS) is hindered by strict data
privacy regulations that prevent centralized data sharing. Federated learning (FL)
has emerged as a promising alternative, enabling collaborative model training without
exposing sensitive data. While FL has been explored in the healthcare domain, research
on its application to financial fraud detection remains relatively limited. Specifically,
FL research on real-world banking fraud types-with detailed customer, account, and
transaction data-remains underexplored. We present the first empirical study of federated
gradient boosting models for financial fraud detection in the banking sector, motivated
by their superior performance over deep learning models on tabular fraud data. We
evaluate and compare four representative federated gradient boosting models using
both a private multi-fraud banking dataset from the Financial Security Institute (FSI)
and a publicly available banking dataset, under various scenarios. Key findings include
the consistent superiority of FedXGBBagging (a federated gradient boosting model),
general vulnerability to data quantity skew, performance instability under bank join/dropout,
and limitations in detecting localized banking fraud types such as ATM skimming. The
findings from our empirical study highlight challenges and design considerations for
deploying FL-based FDSs in the banking sector.
ASAP: Unsupervised Post-training with Label Distribution Shift Adaptive Learning Rate
- Heewon Park
- Mugon Joe
- Miru Kim
- Minhae Kwon
In real-world applications, machine learning models face online label shift, where
label distributions change over time. Effective adaptation requires careful learning
rate selection: too low slows adaptation and too high causes instability. We propose
ASAP (Adaptive Shift Aware Post-training), which dynamically adjusts the learning
rate by computing the cosine distance between current and previous unlabeled outputs
and mapping it within a bounded range. ASAP requires no labels, model ensembles, or
past inputs, using only the previous softmax output for fast, lightweight adaptation.
Experiments across multiple datasets and shift scenarios show ASAP consistently improves
accuracy and efficiency, making it practical for unsupervised model adaptation.
SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in
ECG Delineation
- Minje Park
- Jeonghwa Lim
- Taehyung Yu
- Sunghoon Joo
Electrocardiogram (ECG) delineation, the segmentation of meaningful waveform features,
is critical for clinical diagnosis. Despite recent advances using deep learning, progress
has been limited by the scarcity of publicly available annotated datasets. Semi-supervised
learning presents a promising solution by leveraging abundant unlabeled ECG data.
In this study, we present SemiSegECG, the first systematic benchmark for semi-supervised semantic segmentation (SemiSeg)
in ECG delineation. We curated and unified multiple public datasets, including previously
underused sources, to support robust and diverse evaluation. We adopted five representative
SemiSeg algorithms from computer vision, implemented them on two different architectures:
the convolutional network and the transformer, and evaluated them in two different
settings: in-domain and cross-domain. Additionally, we propose ECG-specific training
configurations and augmentation strategies and introduce a standardized evaluation
framework. Our results show that the transformer outperforms the convolutional network
in semi-supervised ECG delineation. We anticipate that SemiSegECG will serve as a foundation for advancing semi-supervised ECG delineation methods
and will facilitate further research in this domain. The code repository is available
at https://github.com/bakqui/semi-seg-ecg.
A Pivot-Enhanced Question Answering Framework: Using Iterative Sub-Question Decomposition
and Answer-to-Question Verification
- Seyeon Park
- Beakcheol Jang
Question and Answering(QA) in low-resource languages remains a significant challenge
due to the scarcity of high-quality training data. To address this, we propose a robust
framework for low-resource QA. Our framework enhances performance through the integration
of pivot-based translation, sub-question decomposition, and semantic consistency verification.
Our proposed approach utilizes pivoting by translating questions into a high-resource
language, and then translating the answer back into the original language. To improve
the handling of complex queries, we introduce sub-question decomposition, which breaks
down the original question into simpler sub-units for independent QA. Also, we incorporate
a reverse QA mechanism that generates a new question from the predicted answer and
measures its semantic similarity to the original question, thereby validating answer
consistency. Evaluated on the TyDi QA benchmark, the proposed framework achieves a
19.21 chrF score and 0.67 BERTScore, corresponding to at least a 12% improvement in
metrics over direct generation baselines.
FnRGNN: Distribution-aware Fairness in Graph Neural Network
Graph Neural Networks (GNNs) excel at learning from structured data, yet fairness
in regression tasks remains underexplored. Existing approaches mainly target classification
and representation-level debiasing, which cannot fully address the continuous nature
of node-level regression. We propose FnRGNN, a fairness-aware in-processing framework
for GNN-based node regression that applies interventions at three levels: (i) structure-level
edge reweighting, (ii) representation-level alignment via MMD, and (iii) prediction-level
normalization through Sinkhorn-based distribution matching. This multi-level strategy
ensures robust fairness under complex graph topologies. Experiments on four real-world
datasets demonstrate that FnRGNN reduces group disparities without sacrificing performance.
Code is available at https://github.com/sybeam27/FnRGNN.
TwinBandit Prompt Optimizer: Adaptive Prompt Optimization via Synergistic Dual MAB-Guided
Feedback
- Young-Joon Park
- Seong-Ryeong Lee
- Anh-Dung Vo
- Minsung Jung
- Daewoo Choi
A common deficiency in Automatic Prompt Engineering (APE) is the failure to strategically
employ specific failure feedback in concert with the adaptive and coordinated selection
of diverse generation strategies. To address this deficiency, we introduce TwinBandit
Prompt Optimizer (TBPO), an APE framework that employs a synergistic dual Multi-Armed
Bandit (MAB) mechanism for adaptive prompt generation, applicable to black-box Large
Language Models (LLMs). The first MAB identifies the most challenging training instances,
informing an LLM-driven feedback pipeline responsible for generating (1) parameterized
changes to guide prompt evolution and (2) n-shot example configurations. A second
MAB then adaptively selects and ranks the proposed changes based on the empirical
performance of previously generated prompts. TBPO then combines these ranked modifications
and n-shot configurations to generate child prompts with targeted enhancements that
specifically address these challenging instances. Iteratively applying this process
to the best-performing prompts, TBPO forms a closed-loop cycle, strategically generating
and exploring a tree of enhanced prompts. Benchmarking shows that TBPO achieves stronger
performance compared to state-of-the-art APE baselines, highlighted by a 3.87% higher
exact match rate on the GPQA-Diamond dataset. Our approach offers a more targeted
and adaptive APE method by strategically learning from common failures and leveraging
empirically validated generation strategies for a given dataset. More information
is available at https://github.com/yjpark-pub/tbpo_release.
EZ-Sort: Efficient Pairwise Comparison via Zero-Shot CLIP-Based Pre-Ordering and Human-in-the-Loop
Sorting
- Yujin Park
- Haejun Chung
- Ikbeom Jang
Pairwise comparison is often favored over absolute rating or ordinal classification
in subjective or difficult annotation tasks due to its improved reliability. However,
exhaustive comparisons require a massive number of annotations (O(n²)). Recent work[8] has reduced the annotation burden (O(n log n)) by actively sampling pairwise comparisons using a sorting algorithm. We further
improve annotation efficiency by (1) roughly pre-ordering items using the CLIP (Contrastive
Language-Image Pre-training) model hierarchically without training, and (2) replacing
easy, obvious human comparisons with automated ones. The proposed EZ-Sort first produces
a CLIP-based zero-shot pre-ordering, then initializes bucket-aware Elo scores, and finally runs an uncertainty-guided
human-in-the-loop MergeSort. We validated our method using datasets from three domains:
face-age estimation (FGNET)[10], historical image chronology (DHCI)[14], and retinal
image quality assessment (EyePACS)[6]. EZ-Sort reduced human annotation cost by 90.5%
compared to exhaustive pairwise comparisons and by 19.8% compared to prior work[8]
(at n = 100), while improving or maintaining inter-rater reliability. These results demonstrate
that combining CLIP-based priors with uncertainty-aware sampling yields an efficient
and scalable solution for pairwise ranking.
LHMformer:Long-Range Historical Memory-Enhanced Transformer for Traffic Forecasting
As a typical task of multivariate time series, traffic forecasting has always held
significant application value across various fields. Most existing models are constrained
by fixed input lengths, making it difficult to avoid the ambiguity in spatio-temporal
features. We revisited the synergistic effects of Transformer-based models and TCN-MLP
models in spatio-temporal data forecasting and proposed the Long-Range Historical
Memory-Enhanced transformer (LHMformer). The network consists of a transformer module
for processing short-term inputs and a historical feature extraction module for processing
long-term inputs. Experiments on six real-world traffic datasets show that the proposed
method achieves state-of-the-art results. Follow-up experiments demonstrate that the
historical feature extraction module is a key component in solving traffic flow forecasting
problems.
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right
to Be Forgotten
- Wei Qian
- Chenxu Zhao
- Yangyi Li
- Wenqian Ye
- Mengdi Huai
Currently, various uncertainty quantification methods have been proposed to provide
certainty and probability estimates for deep learning models' label predictions. Meanwhile,
with the growing demand for the right to be forgotten, machine unlearning has been
extensively studied as a means to remove the impact of requested sensitive data from
a pre-trained model without retraining the model from scratch. However, the vulnerabilities
of such generated predictive uncertainties with regard to dedicated malicious unlearning
attacks remain unexplored. To bridge this gap, for the first time, we propose a new
class of malicious unlearning attacks against predictive uncertainties, where the
adversary aims to cause the desired manipulations of specific predictive uncertainty
results. We also design novel optimization frameworks for our attacks and conduct
extensive experiments, including black-box scenarios. Notably, our extensive experiments
show that our attacks are more effective in manipulating predictive uncertainties
than traditional attacks that focus on label misclassifications, and existing defenses
against conventional attacks are ineffective against our attacks.
GLCN: Treatment Effect Estimation via Global-Local Networks with Adversarial Debiasing
Estimating Individual Treatment Effects (ITE) from observational data has been widely
applied across various domains. The challenge lies in the fact that observational
data only includes outcomes under one treatment, while potential outcomes under different
treatments need to be inferred. We propose GLCN, a novel causal effect estimation
neural network that fuses global-local modeling. The Global branch, trained on the
entire dataset, captures overall causal relationships to ensure estimation stability,
while the Local branch adopts a matching approach to borrow outcomes from neighboring
instances, thereby capturing local heterogeneity. A gating network dynamically integrates
predictions from both branches through an adaptive weighting mechanism to enhance
adaptability to local variations. A key difficulty in the Local branch lies in defining
a reasonable distance metric for neighboring instances. To address this, our method
employs a match-and-reconstruct strategy, where the reconstruction error serves as
a supervisory signal to guide learning. To mitigate confounding bias, we introduce
a confusion loss based on adversarial training. Extensive experiments on public benchmarks
and real-world industrial datasets demonstrate that our method outperforms state-of-the-art
approaches.
LLMCE: Adapting LLMs with Adversarial Debiasing for Counterfactual Estimation over
Time
Causal inference in time series data is a challenging yet crucial task in real-world
applications such as healthcare and economics, where time-varying confounders often
complicate causal effect estimation. In this work, we propose LLMCE, a novel approach
that leverages frozen LLMs for counterfactual estimation in time series settings.
First, we adapt LLMs to time series by encoding time-varying covariates, past outcomes,
and past and future treatments using a reprogramming mechanism. This aligns time series
with textual modalities, enabling LLMs to generalize efficiently for time series analysis.
Second, to address time-varying confounders, we introduce an adversarial debiasing
strategy. This ensures that learned representations predict outcomes without incorporating
incremental information about future treatments beyond what can be inferred from past
treatments. This dual objective enhances causal estimate reliability while maintaining
predictive accuracy. We evaluate LLMCE on synthetic and real-world datasets, demonstrating
superior performance compared to state-of-the-art baselines.
Land Deformation Prediction via Multi-modal Adaptive Association Learning
- Wanghui Qiu
- Shiyan Hu
- Chenjuan Guo
- Wenbing Shi
- Lina Yu
- Ming Gao
- Aoying Zhou
- Bin Yang
Accurate land deformation prediction using InSAR (Interferometric Synthetic Aperture
Radar) technology is crucial for early warning of geological disasters. However, existing
prediction methods face two major challenges: cross-area association bottleneck and
inadequate handling of temporal distribution heterogeneity. To address these challenges,
we propose Multi-modal Adaptive Association Learning framework (MAAL). For the spatial
knowledge transfer challenge, we introduce a cross-area multi-modal association learning
module that integrates multi-modal (InSAR and geological text) data to enable knowledge
transfer between areas with similar geological characteristics. For temporal distribution
heterogeneity, we develop an adaptive evolution stage recognition module that uses
distribution routers to identify different temporal patterns, then applies corresponding
linear extractors to model the heterogeneous landslide evolution. Experimental validation
on 889 hazardous areas demonstrates that MAAL outperforms baselines.
Controlled Feature Interaction Selection for Deep Sparse Networks
- Yuhang Qiu
- Biqin Song
- Hong Chen
Deep sparse networks (DSNs) have demonstrated exceptional performance for nonlinear
estimation and feature selection, which is crucial for enhancing predictive performance
and interpretability. However, existing methods often overlook feature interactions
and lack theoretical guarantees on false discovery rate (FDR), especially under interaction
scenarios.To address the above issues, this paper develops a DSN-based knockoffs inference
framework for feature interaction selection. Theoretical guarantee can be provided
by knockoffs inference for controlling on FDR. Empirical evaluations on synthetic
datasets demonstrate the capabilities of our proposal on FDR control and identification
of informative interactions.
ThoughtForest-KGQA: A Multi-Chain Tree Search for Knowledge Graph Reasoning
- Xingrun Quan
- Yongkang Zhou
- Junjie Yao
Most multi-hop Knowledge Graph Question Answering (KGQA) methods utilize fixed pruning
strategies that, while efficient, critically impair the diversity of answer paths
and fail to discover complex or less common correct answers.
To address these limitations, this paper introduces ThoughtForest-KGQA, a novel multi-chain
tree search algorithm. The method employs a dual-level reinforcement learning framework
where a local-level agent optimizes individual reasoning chains by capturing fine-grained
semantic details in the knowledge graph. Concurrently, a global-level agent strategically
coordinates the simultaneous exploration of multiple chains. Comprehensive evaluations
conducted across two distinct KGQA benchmarks reveal that this approach identifies
a broader spectrum of correct answers, setting a new state-of-the-art in the field.
Towards Robust Continual Test-Time Adaptation via Neighbor Filtration
- Taki Hasan Rafi
- Amit Agarwal
- Hitesh L. Patel
- Dong-Kyu Chae
Test-Time Adaptation (TTA) aims to adapt an unseen target domain utilizing the unlabeled
target data using a pre-trained source model. Continual TTA is a more challenging
paradigm that deals with non-stationary environments during the test data adaptation.
Most existing continual TTA methods are based on pseudo-labeling, but often (1) rely
on overconfident pseudo-labels and (2) remain unstable under continual distribution
shifts leading to error accumulation and catastrophic forgetting. To tackle these
limitations, we propose Neighbor-Filtration based Continual Test-Time Adaptation (NF-CTTA),
a reliable and memory-aware adaptation framework that addresses these challenges.
NF-CTTA first calibrates pseudo-labels using class-conditional calibration error to
correct over/under-confidence of the model. To further ensure reliability, we introduce
an OOD Neighbor Filtration technique that selects a subset of high-confidence samples
based on entropy and neighbor similarity, ensuring consistency within the semantic
neighborhood. Finally, we propose a priority-guided memory buffer that retains the
most informative low-entropy samples for replay, mitigating catastrophic forgetting
across evolving test distributions. Extensive experiments across multiple domain shift
benchmarks demonstrate that NF-CTTA achieves superior performance and stability compared
to existing TTA and CTTA methods. The code is available at: https://github.com/takihasan/NF-CTTA.
Towards Understanding Bias in Synthetic Data for Evaluation
- Hossein A. Rahmani
- Varsha Ramineni
- Emine Yilmaz
- Nick Craswell
- Bhaskar Mitra
Test collections are crucial for evaluating Information Retrieval (IR) systems. Creating
a diverse set of user queries for these collections can be challenging, and obtaining
relevance judgments, which indicate how well retrieved documents match a query, is
often costly and resource-intensive. Recently, generating synthetic datasets using
Large Language Models (LLMs) has gained attention in various applications. While previous
work has used LLMs to generate synthetic queries or documents to improve ranking models,
using LLMs to create synthetic test collections is still relatively unexplored. Previous
work showed that synthetic test collections have the potential to be used for system
evaluation, however, more analysis is needed to validate this claim. In this paper,
we thoroughly investigate the reliability of synthetic test collections constructed
using LLMs, where LLMs are used to generate synthetic queries, labels, or both. In
particular, we examine the potential biases that might occur when such test collections
are used for evaluation. We first empirically show the presence of such bias in evaluation
results and analyse the effects it might have on system evaluation. We further validate
the presence of such bias using a linear mixed-effects model. Our analysis shows that
while the effect of bias present in evaluation results obtained using synthetic test
collections could be significant, for e.g. computing absolute system performance,
its effect may not be as significant in comparing relative system performance. Codes
and data are available at: https://github.com/rahmanidashti/BiasSyntheticData
Quantum-Amplitude Embedded Adaptation for Parameter-Efficient Fine-Tuning in Large
Language Models
- Emily Jimin Roh
- Joongheon Kim
Large language models (LLMs) require substantial resources for task-specific adaptation,
that motivates the development of parameter-efficient fine-tuning (PEFT) methods.
This paper presents quantum-amplitude embedded adaptation (QAA), a novel PEFT framework
that logarithmically compresses activation vectors using quantum-amplitude embedding
and applies expressive non-linear transformations via parameterized quantum circuits
(PQCs). By replacing linear adapters in attention modules with compact quantum modules,
QAA achieves high expressivity while drastically reducing the number of trainable
parameters. Empirical results demonstrate that QAA performs on par with or better
than existing PEFT under constrained memory and compute budgets, highlighting its
potential for efficient LLM fine-tuning.
Green by Design: Detecting Environmental Claims in Corporate Web Content
- Diya Saha
- Manjira Sinha
- Tirthankar Dasgupta
Corporate entities increasingly embed environmental claims in their digital communication
to project sustainability awareness. Detecting such claims is critical for regulatory
monitoring, corporate accountability, and mitigation of greenwashing practices. Traditional
neural network architectures including large language models however, struggle to
capture both the complex linguistic structures and the subtle stylistic cues that
characterize environmental assertions. In this work, we propose a novel Graph-Augmented
Liquid Neural Network (GLNN) architecture for automatic detection of environmental
claims in corporate web content. Our approach first models the syntactic and semantic
dependencies of text using a Graph Convolutional Network (GCN), while concurrently
encoding stylistic features derived from linguistic markers (e.g., LIWC categories)
into vector representations. These representations are concatenated and passed into
a Liquid Time-Constant (LTC) Network, which provides dynamic adaptability and low-power
efficiency by leveraging continuous-time recurrent dynamics. The integration of GCN-based
stylistic encoding with LTC networks enables the model to robustly capture both structural
dependencies and temporal signal variations inherent in corporate claims, while remaining
energy efficient. Extensive experiments on multiple open datasets demonstrate that
our model outperforms baseline neural architectures in both accuracy and computational
efficiency, highlighting the potential of graph-augmented liquid networks as a foundation
for sustainable AI in sustainability monitoring.
Towards Equitable Coreset Selection: Addressing Challenges Under Class Imbalance
- Liyana Sahir Kallooriyakath
- Anugu Namratha Reddy
- B Srinath Achary
- Ashutosh Sharma
- Krisha Shah
- Sonia Gupta
- Siddhartha Asthana
Coreset selection reduces training cost by constructing compact, representative subsets,
but existing methods largely assume balanced class distributions. Under imbalance,
this assumption yields biased subsets that discard critical minority samples and degrade
accuracy. We propose Equitable Coreset Selection (ECS), a framework tailored for imbalanced
data. ECS mitigates these issues through adaptive pruning that preserves minority
examples, class-sensitive partitioning aligned with skewed class distributions, and
stratified graph-cut selection for diverse sampling. Experiments across multiple imbalanced
datasets show that ECS improves generalization and substantially boosts minority-class
accuracy compared to standard coreset methods.
Open-Source LLM-based Relevance Assessment vs. Highly Reliable Manual Relevance Assessment:
A Case Study
- Tetsuya Sakai
- Khant Myoe Rain
- Rikiya Takehi
- Sijie Tao
- Young-In Song
There is currently a controversy as to whether LLM-based relevance assessment can
replace manual relevance assessment for evaluating search engines accurately at least
at the run level (e.g., ranking TREC runs by mean nDCG) if not at the individual topic level (e.g., computing an nDCG score for a Search
Engine Result Page). This study utilises an NTCIR web search test collection that
features highly reliable human relevance labels (reflecting the collective view of
eight independent assessors per topic) to complement prior findings from the skeptic
camp. Our experiments show that LLM-based assessment (using Llama and Qwen) cannot
replace human assessment even for ranking systems in terms of mean nDCG. More importantly,
LLM-based assessment lacks discriminative power: it misses many statistically significant
differences that manual assessment can detect. Furthermore, LLM-based assessment occasionally
yields potential false alarms in terms of statistical significance, which may let
researchers reach incorrect conclusions.
LLM-as-a-Judge in Entity Retrieval: Assessing Explicit and Implicit Relevance
- Mohammad Hossein Saliminabi
- Negar Arabzadeh
- Seyed Mohammad Hosseini
- Dimitrios Androutsos
- Morteza Zihayat
- Ebrahim Bagheri
Entity retrieval plays a critical role in information access systems, yet the development
and evaluation of retrieval models remain constrained by the limited availability
of high-quality supervision. While recent work has demonstrated the utility of large
language models (LLMs) as relevance assessors in passage and document retrieval, their
reliability in the context of entity retrieval-where targets are abstract, underspecified,
and often semantically sparse-remains unexplored. In this work, we evaluate LLM-based
judgments against two complementary supervision signals: human-annotated relevance
labels from the DBpedia-Entity benchmark and implicit feedback from user clicks in
the LaQuE dataset. We show that LLMs exhibit strong agreement with expert annotations
and replicate user click patterns with over 91% agreement, suggesting alignment with
behavioral judgments despite noisy input queries. We further identify and analyze
systematic mismatches for user clicks on irrelevant entities. Our findings establish
LLMs not only as effective annotators for entity relevance judgment-even when given
only the entity title-but also as powerful tools for predicting click-through behavior
and simulating explainable user intent. Our code, prompts, and data are publicly available
at: https://github.com/17shiraz/ClickLLM
CondFairGen A Fair Conditional Generator for Tabular Data via Adaptive Sampling
- David Sanchez Jr.
- Anantaa Kotal
Recent advances in synthetic data generation have enabled high-fidelity modeling of
tabular datasets, yet fairness remains a peripheral concern, often addressed through
architectural modifications or fairness-aware loss functions. We introduce CondFairGen,
a fairness-aware generative model that enforces group fairness through dynamic control
of conditional exposure during training. Rather than altering the model architecture
or objective, CondFairGen reweights the sampling distribution over conditioning vectors
based on disparity metrics across protected attributes and their intersections. This
reweighting increases exposure to underrepresented or high-disparity subgroups, guiding
the model toward fairer conditional distributions. By embedding fairness directly
into the training schedule, CondFairGen offers a principled alternative to adversarial
debiasing or post hoc correction. Empirical evaluations on standard tabular benchmarks
demonstrate that CondFairGen substantially improves both marginal and intersectional
fairness metrics while preserving downstream utility. These results establish conditional
exposure as a practical and effective mechanism for fairness intervention in generative
modeling.
HF-RAG: Hierarchical Fusion-based RAG with Multiple Sources and Rankers
- Payel Santra
- Madhusudan Ghosh
- Debasis Ganguly
- Partha Basuchowdhuri
- Sudip Kumar Naskar
Leveraging both labeled (input-output associations) and unlabeled data (wider contextual
grounding) may provide complementary benefits in retrieval augmented generation (RAG).
However, effectively combining evidence from these heterogeneous sources is challenging
as the respective similarity scores are not inter-comparable. Additionally, aggregating
beliefs from the outputs of multiple rankers can improve the effectiveness of RAG.
Our proposed method first aggregates the top-documents from a number of IR models
using a standard rank fusion technique for each source (labeled and unlabeled). Next,
we standardize the retrieval score distributions within each source by applying z-score
transformation before merging the top-retrieved documents from the two sources. We
evaluate our approach on the fact verification task, demonstrating that it consistently
improves over the best-performing individual ranker or source and also shows better
out-of-domain generalization.
Effect of Model Merging in Domain-Specific Ad-hoc Retrieval
- Taiga Sasaki
- Takehiro Yamamoto
- Hiroaki Ohshima
- Sumio Fujita
In this study, we evaluate the effect of model merging in ad-hoc retrieval tasks.
Model merging is a technique that combines the diverse characteristics of multiple
models. We hypothesized that applying model merging to domain-specific ad-hoc retrieval
tasks could improve retrieval effectiveness. To verify this hypothesis, we merged
the weights of a source retrieval model and a domain-specific (non-retrieval) model
using a linear interpolation approach. A key advantage of our approach is that it
requires no additional fine-tuning of the models. We conducted two experiments each
in the medical and Japanese domains. The first compared the merged model with the
source retrieval model, and the second compared it with a LoRA fine-tuned model under
both full and limited data settings for model construction. The experimental results
indicate that model merging has the potential to produce more effective domain-specific
retrieval models than the source retrieval model, and may serve as a practical alternative
to LoRA fine-tuning, particularly when only a limited amount of data is available.
cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based
Context Blending
- Anirudh Satheesh
- Keenan Powell
- Hua Wei
Many multi-agent reinforcement learning (MARL) algorithms are trained in fixed simulation
environments, making them brittle when deployed in real-world scenarios with more
complex and uncertain conditions. Contextual MARL (cMARL) addresses this by parameterizing
environments with context variables and training a context-agnostic policy that performs
well across all environment configurations. Existing cMARL methods attempt to use
curriculum learning to help train and evaluate context-agnostic policies, but they
often rely on unreliable proxy signals, such as value estimates or generalized advantage
estimates that are noisy and unstable in multi-agent settings due to inter-agent dynamics
and partial observability. To address these issues, we propose Contextual Multi-Agent
LLM-Guided Curriculum Learning with Diversity-Based Context Blending (cMALC-D), a
framework that uses Large Language Models (LLMs) to generate semantically meaningful
curricula and provide a more robust evaluation signal. To prevent mode collapse and
encourage exploration, we introduce a novel diversity-based context blending mechanism
that creates new training scenarios by combining features from prior contexts. Experiments
in traffic signal control domains demonstrate that cMALC-D improves both generalization
and sample efficiency compared to existing curriculum learning baselines.
Membership Inference Attack Vulnerabilities of Record Linkage Models
- Piyumi Seneviratne
- Dinusha Vatsalan
- Dali Kaafar
Record linkage plays a crucial role in integrating health, legal, and administrative
data. In domains such as healthcare, unstructured records, including clinical notes,
contain rich information, making their integration valuable for applications like
clinical trials. While deep learning models have improved linkage quality, their privacy
risks remain under-explored. We present what is, to our knowledge, the first systematic
study of membership inference attack vulnerabilities in record linkage models trained
on de-identified texts. Unlike traditional classifiers, linkage models operate on
record pairs, prompting a rethinking of what constitutes membership leakage. Does
it occur if only a record pair was seen together during training or individually?
Or if neither was seen, but the pair resembles known patterns? We introduce a black-box
attack based on semantic perturbation sensitivity, requiring no access to the model's
internals. Our findings expose a previously unaddressed membership inference vulnerability
in record linkage models: black-box attacks, even with a simple threshold-based attack
model, achieved up to 92% precision and AUC 0.79. Across record linkage models trained
on MIMIC-IV and PMC-Patients datasets, we observe that perturbing training-seen phrases
causes significantly larger confidence shifts (e.g., Δ s = -0.648) and higher change in predicted label (label flip from 0 = non-match to 1 = match and vice versa) (e.g., 68.75%) compared to unseen
variants. These preliminary results reveal membership inference vulnerabilities in
text-based linkage systems, highlighting the need for deeper investigation into privacy
risks, motivating new lines of privacy defences for pairwise models.
Do LLMs Dream of Electric Emotions? Towards Quantifying Metacognition and Generalizing
the Teacher-Student Model Using Ensembles of LLMs
- Ricky J. Sethi
- Hefei Qiu
- Charles Courchaine
- Joshua Iacoboni
In this paper, we propose a novel framework for quantifying metacognitive processes
in ensembles of Large Language Models (LLMs) and extending the traditional teacher-student
model through the lens of dual-process cognitive theory. We introduce a Metacognitive
State Vector (MSV) that operationalizes metacognition across five dimensions: emotional
response analysis, correctness evaluation, experiential matching, conflicting information
estimation, and problem importance task prioritization.
In our formulation, the rapid, intuitive thinking of System 1 is mapped onto a smaller
''student'' (single LLM or ensemble of bagged LLMs) while the deliberate, analytical
reasoning of System 2 is mapped onto a larger ''teacher'' (ensemble of boosted LLMs)
using the MSV. Additionally, we utilize a graph-theoretic architecture to model ensemble
interactions, enabling LLMs to assume dynamic roles and transition between System
1 and System 2 for improved decision-making. We view this work as a first step towards
establishing a true measure of emergent metacognition in such systems.
FAIR Data Assessment Using LLMs: The Fair-Way
- Anmol Sharma
- Sulayman K. Sowe
- Soo-Yon Kim
- Sayed Hoseini
- Fidan Limani
- Zeyd Boukhers
- Christoph Lange
- Stefan Decker
As part of modern research practices, the FAIR data principles have become essential
for data discoverability, usability, and sharing. Existing implementations for automatically
assessing FAIR adherence (FAIRness) often suffer from limited usability, inconsistent
accuracy, and difficult-to-interpret results, as they require explicit rules to cover
for specific FAIR assessment frameworks, which are not easy to generalize. This paper
introduces Fair-Way, an open source tool that leverages Large Language Models (LLMs)
to automate FAIRness assessment. Fair-Way applies a divide-and-conquer approach to
decompose the assessment process into fine-grained tasks, as well as to split the
metadata into manageable chunks. Evaluation demonstrates that Fair-Way achieves performance
comparable to existing tools, while outperforming them in several key metrics. Moreover,
Fair-Way generalizes across FAIR assessment indicators without requiring explicitly
programmed logic and supports both structured and unstructured metadata in diverse
formats. Finally, it enables user-defined, domain-specific tests, which are typically
not supported by other systems. Overall, Fair-Way represents a scalable and flexible
solution to accelerate FAIR data practices across research domains.
The Metadata Impedance Mismatch between Databases and Programming Languages
- Vishal Sharma
- Curtis Dyreson
This paper identifies a problem with databases that support metadata. Previous research
has proposed annotating values stored in a database with metadata, such as time, security,
privacy, and quality. The metadata influences how a value is used. For example, sequenced
temporal semantics proscribes comparing a value to one alive at a different time.
But when values stored in a database are pulled into the realm of a programming language
from a database through an API, web service, or in a user-defined function, a step-down transformation of the data occurs. The transformation strips the metadata, changing the semantics
of the value. The metadata is discarded because a programming language process a scalar
value, not one annotated with metadata. This metadata-related impedance mismatch between databases and programming languages limits the real-world adoption of databases
that support metadata.
LLM4ES: Learning User Embeddings from Event Sequences via Large Language Models
- Aleksei Shestov
- Omar Zoloev
- Maksim Makarenko
- Mikhail Orlov
- Egor Fadeev
- Ivan Kireev
- Andrey Savchenko
This paper presents LLM4ES, a novel framework that exploits large pre-trained language
models (LLMs) to derive user embeddings from event sequences. Event sequences are
transformed into a textual representation, which is subsequently used to fine-tune
an LLM through next-token prediction to generate high-quality embeddings. We introduce
a text enrichment technique that enhances LLM adaptation to event sequence data, improving
representation quality for low-variability domains. Experimental results demonstrate
that LLM4ES achieves state-of-the-art performance in user classification tasks in
financial and other domains, outperforming existing embedding methods. The resulting
user embeddings can be incorporated into a wide range of applications, from user segmentation
in finance to patient outcome prediction in healthcare.
Company-Specific Knowledge Matters: Retrieval-Augmented Generation for Earnings Call
Answer Rehearsal
- Yung-Yu Shih
- Yun-Nung Chen
- Chung-Chi Chen
Retrieval-augmented generation (RAG) has long been used to guide generative models
in producing more accurate answers, with most discussions focusing on reading comprehension-based
question answering (QA). However, their role in real-world answer rehearsal scenarios
remains underexplored. The rise of large language models (LLMs) presents new opportunities
to develop systems that assist professionals, making this discussion both timely and
essential. This paper explores how to better support corporate executives in answering
questions from professional analysts during earnings calls. We compare the impact
of two external knowledge sources: large-scale causal knowledge graphs (KGs) and historical
Q&A records-retrieved either from a global pool or company-specific archives. Our
findings suggest that a company's historical Q&A records are more influential than
causal KGs in improving response quality. To the best of our knowledge, this is the
first study to systematically compare and analyze different knowledge resources in
answer rehearsal. Our findings show the potential of inspiring future research on
the interplay between KG and historical QA pairs for answer rehearsal.
Zero-shot Stroke Lesion Segmentation via CAM-guided Prompting of MedSAM2
- MohammadJavad Shokri
- Yuchong Yao
- Nandakishor Desai
- Aravinda S. Rao
- Angelos Sharobeam
- Bernard Yan
- Marimuthu Palaniswami
Accurate segmentation of stroke lesions in diffusion-weighted imaging (DWI) is crucial
for clinical decision-making. However, automated infarct segmentation remains challenging
due to variable infarct sizes and locations, and it is labor-intensive, requiring
expert manual annotations for training. We propose a zero-shot framework to eliminate
the need for manual segmentation labels by leveraging weak supervision from class
activation maps (CAMs) to guide segmentation using MedSAM2, a foundation model for
3D medical image segmentation. By extracting attention maps from a fine-tuned ResNet
on DWI scans labeled with stroke etiology (cause) and combining them with intensity
information, we identify key regions and generate bounding-box prompts for MedSAM2.
Our method achieves a Dice score of 54.2 ± 5.3% without any manual segmentation labels or tuning of the MedSAM2 model, demonstrating
its potential as a scalable solution for reliable pseudo-label generation.
On Influence Tail Bounds in Online Social Networks
- Michael Simpson
- Laks V. S. Lakshmanan
- Venkatesh Srinivasan
- Alex Thomo
The influence estimation and maximization problems study the expected reach of a seed
set in social networks under a stochastic propagation model. Motivated by the practical
utility of characterizing the distribution of reach values, we systematically analyze
the tail behaviour of the reach of a seed set. We study tail bound query problems
that, for a given seed set, compute either the maximum reach for a given probability
threshold or the highest probability of achieving a target reach. We prove #P-hardness
and propose algorithms that balance efficiency and accuracy. We also examine tail
bound optimization problems that find a seed set maximizing reach for a target probability
or maximizing the probability of achieving a target reach, and establish strong inapproximability
results.
Experiments on real datasets demonstrate the effectiveness of our algorithms, showing
that good approximations of the actual reach for a desired probability can be computed
efficiently and that the actual reach can be very different from the expected reach.
Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language
Models
- Indu Solomon
- Aye Phyu Phyu Aung
- Uttam Kumar
- Senthilnath Jayavelu
Continual learning (CL) enables models to adapt to evolving data streams without catastrophic
forgetting, a fundamental requirement for real-world AI systems. However, the current
methods often depend on large replay buffers or heavily annotated datasets which are
impractical due to storage, privacy, and cost constraints. We propose CLTS (Continual
Learning via Text-Image Synergy), a novel class-incremental framework that mitigates
forgetting without storing real task data. CLTS leverages pre-trained vision-language
models, BLIP (Bootstrapping Language-Image Pre-training) for caption generation and
stable diffusion for sample generation. Each task is handled by a dedicated Task Head,
while a Task Router learns to assign inputs to the correct Task Head using the generated
data. On three benchmark datasets, CLTS improves average task accuracy by up to 54%
and achieves 63 times better memory efficiency compared to four recent continual learning
baselines, demonstrating improved retention and adaptability. CLTS introduces a novel
perspective by integrating generative text-image augmentation for scalable continual
learning.
Filtered One-Shot Training for Quantum Architecture Search
- Seok Bin Son
- Samuel Yen-Chi Chen
- Joongheon Kim
- Soohyun Park
Quantum neural networks (QNNs) have attracted growing interest for their potential
to accelerate computation and leverage quantum supremacy. Their performance largely
depends on gate placement within parameterized quantum circuits (PQCs), making neural
architecture search (NAS) a suitable approach for discovering efficient structures.
However, applying conventional NAS to QNNs is hindered by barren plateaus, noise,
hardware constraints, and high computational costs. To address these challenges, this
paper proposes filtered one-Shot training for quantum architecture search, which combines
deep reinforcement learning (DRL) for constraint-aware gate placement with a one-shot
supernet for efficient weight sharing. A filtering mechanism further removes weak
paths to narrow the search space. Experimental results show that the method reduces
parameters by up to 76% while maintaining high accuracy.
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization
for Text-Image Retrieval
- Jonghyun Song
- Youngjune Lee
- Gyu-Hwung Cho
- Ilhyeon Song
- Saehun Kim
- Yohan Jo
Vision-Language Pretrained (VLP) models have achieved impressive performance on multimodal
tasks, including text-image retrieval, based on dense representations. Meanwhile,
Learned Sparse Retrieval (LSR) has gained traction in text-only settings due to its
interpretability and efficiency with fast term-based lookup via inverted indexes.
Inspired by these advantages, recent work has extended LSR to the multimodal domain.
However, these methods often rely on computationally expensive contrastive pre-training,
or distillation from a frozen dense model, which limits the potential for mutual enhancement.
To address these limitations, we propose a simple yet effective framework that enables
bi-directional learning between dense and sparse representations through Self-Knowledge
Distillation. This bi-directional learning is achieved using an integrated similarity
score-a weighted sum of dense and sparse similarities-which serves as a shared teacher
signal for both representations. To ensure efficiency, we fine-tune the final layer
of the dense encoder and the sparse projection head, enabling easy adaptation of any
existing VLP model. Experiments on MSCOCO and Flickr30k demonstrate that our sparse
retriever not only outperforms existing sparse baselines, but also achieves performance
comparable to-or even surpassing-its dense counterparts, while retaining the benefits
of sparse models.
Evaluating Differentially Private Generation of Domain-Specific Text
- Yidan Sun
- Viktor Schlegel
- Srinivasan Kolumam Nandakumar
- Iqra Zahid
- Yuping Wu
- Warren Del-Pinto
- Goran Nenadic
- Lam Siew Kei
- Jie Zhang
- Anil Bharath
Generative AI offers transformative potential for high-stakes domains such as healthcare
and finance, yet privacy and regulatory barriers hinder the use of real-world data.
To address this, differentially private synthetic data generation has emerged as a
promising alternative. In this work, we introduce a unified benchmark to systematically
evaluate the utility and fidelity of text datasets generated under formal Differential
Privacy (DP) guarantees. Our benchmark addresses key challenges in domain-specific
benchmarking, including choice of representative data and realistic privacy budgets,
accounting for pre-training and a variety of evaluation metrics. We assess state-of-the-art
privacy-preserving generation methods across five domain-specific datasets, revealing
significant utility and fidelity degradation compared to real data, especially under
strict privacy constraints. These findings underscore the limitations of current approaches,
outline the need for advanced privacy-preserving data sharing methods and set a precedent
regarding their evaluation in realistic scenarios.
Random-Feature Graph Neural Networks with Representation Tokenized Transformer for
Robust One-class Anomaly Detection
Real-world anomaly detection seldom enjoys abundant, perfectly curated normal data;
labels are often limited and untrustworthy, allowing genuine outliers to masquerade
as benign. We cast one-class anomaly detection into this harsh setting of label scarcity
and noise and present Random-Feature Graph Neural Networks with a Representation Tokenized
Transformer (RFGT). RFGT first partitions the feature space into random, non-overlapping
subsets, yielding multiple complementary views that dilute the impact of any corrupted
dimension. With a graph constructed from each view, a graph learner propagates the
sparse normal signal to unlabeled neighbors while tempering mislabeled anomalies.
A novel Representation Tokenized Transformer module is learned to capture cross-feature
dependencies and implicitly down-weight inconsistent signals within a data instance.
Extensive experiments on eight tabular benchmarks exhibit that RFGT outperforms the
state-of-the-art as the amount of clean data shrinks or the label noise grows.
NuFact: Validating Numerical Assertions for Knowledge Graphs
- Mohammad Taufeeq
- Koninika Pal
Validating extracted assertions is one of the crucial steps for curating knowledge
graphs (KGs). While existing research extensively explores methods for validating
KG assertions, specifically for the categorical facts -- where both the Subject and
Object in triples are entities, there remains a significant gap in the validation
of numerical assertions, where the Object represents a quantity. Moreover, general
fact-validation methods are inefficient for validating numerical claims due to their
limited coverage in KGs. Furthermore, large-language models (LLMs) exhibit limitations
in quantitative reasoning, further exacerbating the challenge. These gaps compromise
the reliability of KGs in applications that require precise numerical accuracy. Addressing
these challenges, we propose NuFact, a framework for validating numerical assertions
using evidence gathered from the web. NuFact combines the rich contextual understanding
of LLMs with manually crafted quantity-focused and temporal features derived from
extracted evidence to assess numerical claims. Experimental evaluations show that
NuFact significantly outperforms existing fact-checking baselines and popular LLM-powered
agents.
Frozen in the Middle: Hidden States Remain Unchanged Across Intermediate Layers of
Language Models
- Pavel Tikhonov
- Dmitry Ilvovsky
This paper investigates the internal mechanisms of large language models (LLMs) through
the lens of Mechanistic Interpretability (MI). We present novel findings on how information
is processed and propagated within these models. Our key contributions include: (1)
providing evidence for the localized nature of fact storage and information propagation
from subject tokens; (2) introducing a new observation that hidden states remain largely
unchanged across multiple middle layers, which we call the ''plateau'' phenomenon;
and (3) developing a manually crafted diagnostic dataset of factual prompts. Our work
complements and extends prior research on transformer information flow by demonstrating
that, contrary to the prevailing assumption of sequential representation enrichment
across layers, subject token states stabilize early and remain functionally static
throughout multiple middle layers while containing all necessary information for the
final prediction. These insights advance our understanding of how transformers process
factual information and suggest a more complex pattern of layer specialization than
previously identified.
A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via
Large Language Models
- Yu Tokutake
- Kazushi Okamoto
- Kei Harada
- Atsushi Shibata
- Koki Karube
Serendipity in recommender systems (RSs) has attracted increasing attention as a concept
that enhances user satisfaction by presenting unexpected and useful items. However,
evaluating serendipitous performance remains challenging because its ground truth
is generally unobservable. The existing offline metrics often depend on ambiguous
definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability.
To address this issue, we propose a universally applicable evaluation framework that
leverages large language models (LLMs) known for their extensive knowledge and reasoning
capabilities, as evaluators. First, to improve the evaluation performance of the proposed
framework, we assessed the serendipity prediction accuracy of LLMs using four different
prompt strategies on a dataset containing user-annotated serendipitous ground truth
and found that the chain-of-thought prompt achieved the highest accuracy. Next, we
re-evaluated the serendipitous performance of both serendipity-oriented and general
RSs using the proposed framework on three commonly used real-world datasets, without
the ground truth. The results indicated that there was no serendipity-oriented RS
that consistently outperformed across all datasets, and even a general RS sometimes
achieved higher performance than the serendipity-oriented RS.
Densest Subgraph Discovery on Decentralized Graphs with Local Edge Differential Privacy
- Wenping Tong
- Yi Zhou
- Yanhao Wang
- Cen Chen
- Minghao Zhao
Various real-world graphs, such as social and transaction networks, are typically
distributed across users, each of whom holds a local view of the graph (i.e., their
own relationships with others). Densest Subgraph Discovery (DSD) on such decentralized
graphs is a fundamental task that can uncover valuable insights for downstream applications,
including fraud detection, community identification, and user behavior mining. Additionally,
in many scenarios, due to privacy concerns, sensitive original local views cannot
be collected for DSD. Although there have been extensive studies on DSD, most existing
algorithms either do not take user privacy into account or are specific to the centralized
privacy setting that requires a (trusted) curator to collect all local views from
users and then analyze the entire graph privately.
To address these issues, we consider DSD under Local Edge Differential Privacy (LEDP),
which allows users to perturb their local graph views via a randomizer such that any
single edge is indistinguishable from the data transmitted to the server. We propose
a new LEDP algorithm for DSD that utilizes the Randomized Response (RR) mechanism
for user-side perturbation and extends greedy peeling with degree correction to find
the densest subgraph on the server-side noisy global graph. Our proposed algorithm
provides provable privacy and approximation guarantees. Finally, we perform experimental
evaluations on real-world graphs to show that our proposed algorithm achieves better
privacy-utility trade-offs than state-of-the-art LEDP baselines for DSD.
Channel-Independent Refiner for Multivariate Time Series Forecasting
- Jie Wang
- Zhongguang Zheng
- Chaoliang Zhong
- Jun Sun
Real-world time series data are usually multivariate with complex channel relations.
Some channels are highly related, while others are with limited correlation. Channel
independence has shown great significance in multivariate time series forecasting
to capture better individual channel characteristics. In order to learn channel patterns
better, we propose a channel-independent Refiner as a plug-and-play module. Specifically,
we devise a channel-wise Refiner connected to the output of existing methods. Leveraging
the concatenation of original input and the coarse prediction from the base model,
the Refiner produces a better estimation. Our Refiner also benefits from the channel-independent
design and a post-training strategy, achieving significant improvement over base models.
Extensive experiments on iTransformer, FEDformer, Autoformer, FreTS, DLinear and TSMixer
demonstrate that our Refiner reduces forecasting errors on both Transformer-based
and MLP-based models in over 90% of the experimental settings. Our Refiner combined
with iTransformer establishes the new state-of-the-art.
Social Relation Meets Recommendation: Augmentation and Alignment
- Lin Wang
- Weisong Wang
- Xuanji Xiao
- Qing Li
Recommender systems are essential for modern content platforms, yet traditional behavior-based
models often struggle with cold users who have limited interaction data. Engaging
these users is crucial for platform growth. To bridge this gap, we propose leveraging
the social-relation graph to enrich interest representations from behavior-based models.
However, extracting value from social graphs is challenging due to relation noise
and cross-domain inconsistency. To address the noise propagation and obtain accurate
social interest, we employ a dual-view denoising strategy, employing low-rank SVD
to the user-item interaction matrix for a denoised social graph and contrastive learning
to align the original and reconstructed social graphs. Addressing the interest inconsistency
between social and behavioral interests, we adopt a ''mutual distillation'' technique
to isolate the original interests into aligned social/behavior interests and social/behavior
specific interests, maximizing the utility of both. Experimental results on widely
adopted industry datasets verify the method's effectiveness, particularly for cold
users, offering a fresh perspective for future research. The implementation can be
accessed at https://github.com/WANGLin0126/CLSRec.
Latent Graph Structure Learning for Large-Scale Traffic Forecasting
- Meng Wang
- Longgang Xiang
- Chenhao Wu
- Zejiao Wang
- Xin Chen
- Shaozu Xie
- Ying Luo
Large-scale traffic forecasting poses new challenges in model architecture design
due to the large data volume and high computation complexity. An effective solution
is to partition traffic observation locations into several patches on which the computation
burden decreases from node to patch level. However, since traffic states on road network
are highly dynamic, current static and hard partition methods can not fully adapt
to locations whose traffic semantics and evolving patterns vary across time. Therefore,
we suppose locations on road network latently belong to certain graph structures and
propose an adaptive and dynamic patching method in data-driven fashion. Besides, we
design an end-to-end framework to learn patch assignments and predict future traffic
states simultaneously. Experiments on real world large-scale traffic datasets further
verify the effectiveness and interpretability of our proposed method.
DAGP: Difficulty-Aware Graph Pruning for LLM-Based Multi-Agent System
Large Language Model-based multi-agent systems demonstrate strong capabilities across
different tasks. However, current methods often rely on static or task-specific designs.
These instance-agnostic methods overlook varying instance complexities, leading to
redundant communication in simple scenarios and insufficient coordination in demanding
ones. Consequently, they failed to achieve effective and efficient collaborative reasoning
among agents. To overcome these limitations, we propose Difficulty-Aware Graph Pruning
(DAGP), an adaptive framework that configures communication structures based on instance-specific
difficulty. DAGP integrates a difficulty estimation module and a sparsity control
mechanism to selectively activate communication edges based on each instance, promoting
efficient and targeted collaboration. Empirical evaluations across diverse benchmarks
demonstrate that DAGP consistently achieves state-of-the-art performance compared
to other baselines while reducing average token usage by 45%.
Unified Robustness via Spurious-Invariant Features and On-Manifold Adversaries
- Xu Wang
- Rajgopal Kannan
- Viktor Prasanna
Vision models fail both under tiny pixel attacks and under real-world shifts in style
or background because they latch onto spurious features. We propose a two-step, label-free
method. (1) Spurious-Invariant Self-Supervised Pre-training (SISSP) trains an encoder
to collapse representations of the same object despite randomized styles and backgrounds,
pruning shortcut signals. (2) Semantic-Alignment Adversarial Refinement (SAAR) takes
any attack and projects it back into a small ball within SISSP feature space, yielding
adversaries that look natural yet still fool the classifier. Fine-tuning with SISSP
features and SAAR images produces a ResNet-50 that retains 64% ImageNet accuracy,
46% PGD robustness without environment labels or specialized augmentations. Together,
SISSP provides a semantics-aware metric and SAAR generates on-manifold adversaries,
achieving the first ImageNet-scale model robust to both pixel-level noise and semantic
shifts.
GenR1-Searcher: Curriculum Reinforcement Learning for Dynamic Retrieval and Document
Generation
- Yu Wang
- Yixuan Zhao
- Renrui Duan
- Jingyuan Li
- Yuanzhuo Wang
- Kun Zhang
Current RAG approaches follow two paradigms with complementary limitations: retrieve-then-read
methods access reliable sources but produce noisy and incomplete information, while
generate-then-read approaches create query-aligned documents but suffer from hallucinations.
We propose GenR1-Searcher, a curriculum-based reinforcement learning framework that enables small language models
to intelligently decide between retrieval and document generation during multi-hop
reasoning through a three-stage progressive training strategy: first learning tool
invocation syntax through format rewards, then mastering retrieval strategies with
answer-based rewards, and finally acquiring adaptive tool selection capabilities when
both knowledge sources are available. Extensive experiments on four multi-hop QA benchmarks
demonstrate that GenR1-Searcher consistently outperforms strong baselines, achieving substantial relative improvements
of 26.8%, 21.2%, and 15.7% on HotpotQA, 2WikiMultiHopQA, and MuSiQue respectively,
outperforming competitive baselines including Search-o1, Search-R1, and ReARTeR, with
analysis revealing that our model learns principled tool selection strategies that
adapt based on tool capabilities and query characteristics.
Temporal-Aware User Behaviour Simulation with Large Language Models for Recommender
Systems
- Xinye Wanyan
- Danula Hettiachchi
- Chenglong Ma
- Ziqi Xu
- Jeffrey Chan
Large Language Models (LLMs) demonstrate human-like capabilities in language understanding,
reasoning, and generation, driving interest in using LLM-based agents to simulate
human feedback in recommender systems. However, most existing approaches rely on static
user profiling, neglecting the temporal and dynamic nature of user interests. This
limitation stems from a disconnect between language modelling and behaviour modelling,
which constrains the capacity of agents to represent sequential patterns. To address
this challenge, we propose a Dynamic Temporal-aware Agent-based simulator for Recommender
Systems, DyTA4Rec, which enables agents to model and utilise evolving user behaviour
based on historical interactions. DyTA4Rec features a dynamic updater for real-time
profile refinement, temporal-enhanced prompting for sequential context, and self-adaptive
aggregation for coherent feedback. Experimental results at group and individual levels
show that DyTA4Rec significantly improves the alignment between simulated and actual
user behaviour by modelling dynamic characteristics and enhancing temporal awareness
in LLM-based agents.
Can Large Vision-Language Models Understand Multimodal Sarcasm?
- Xinyu Wang
- Yue Zhang
- Liqiang Jing
Sarcasm is a complex linguistic phenomenon that involves a disparity between literal
and intended meanings, making it challenging for sentiment analysis and other emotion-sensitive
tasks. While traditional sarcasm detection methods primarily focus on text, recent
approaches have incorporated multimodal information. However, the application of Large
Visual Language Models (LVLMs) in Multimodal Sarcasm Analysis (MSA) remains underexplored.
In this paper, we evaluate LVLMs in MSA tasks, specifically focusing on Multimodal
Sarcasm Detection and Multimodal Sarcasm Explanation. Through comprehensive experiments,
we identify key limitations, such as insufficient visual understanding and a lack
of conceptual knowledge. To address these issues, we propose a training-free framework
that integrates in-depth object extraction and external conceptual knowledge to improve
the model's ability to interpret and explain sarcasm in multimodal contexts. The experimental
results on multiple models show the effectiveness of our proposed framework. The code
is available at https://github.com/cp-cp/LVLM-MSA.
Empirical Analysis on User Profile in Personalized LLMs
- Bin Wu
- Zhengyan Shi
- Hossein A. Rahmani
- Varsha Ramineni
- Emine Yilmaz
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown
to enhance performance on a wide range of tasks. However, the precise role of user
profiles and their effect mechanism on LLMs is unclear. This study first confirms
that the effectiveness of user profiles stems primarily from their personalization
information, with input-relevant information contributing meaningfully only when built
upon personalization. Furthermore, we investigate how user profiles affect the personalization
of LLMs. Within the user profile, we reveal that it is the historical personalized
response produced or approved by users that plays a pivotal role in personalizing
LLMs. This discovery unlocks the potential of LLMs to incorporate more user profiles
within the constraints of limited input length. As for the position of user profiles,
we observe that user profiles integrated into different positions of the input context
do not contribute equally to personalization. Instead, user profiles closer to the
beginning have more impact on the personalization of LLMs. Our findings reveal the
role of user profiles for the personalization of LLMs, and showcase how incorporating
user profiles impacts performance to leverage user profiles effectively.
Fact or Facsimile? Evaluating the Factual Robustness of Modern Retrievers
- Haoyu Wu
- Qingcheng Zeng
- Kaize Ding
Dense retrievers and rerankers are central to retrieval-augmented generation (RAG)
pipelines, where accurately retrieving factual information is crucial for maintaining
system trustworthiness and defending against RAG poisoning. However, little is known
about how much factual competence these components inherit or lose from the large
language models (LLMs) they are based on. We pair 12 publicly released embedding checkpoints
with their original base LLMs and evaluate both sets on a factuality benchmark. Across
every model evaluated, the embedding variants achieve markedly lower accuracy than
their bases, with absolute drops ranging from 12 to 43 percentage points (median 28
pts) and typical retriever accuracies collapsing into the 25-35 % band versus the
60-70 % attained by the generative models. This degradation intensifies under a more
demanding condition: when the candidate pool per question is expanded from four options
to one thousand, the strongest retriever's top-1 accuracy falls from 33 % to 26 %,
revealing acute sensitivity to distractor volume. Statistical tests further show that,
for every embedding model, cosine-similarity scores between queries and correct completions
are significantly higher than those for incorrect ones (p < 0.01), indicating decisions driven largely by surface-level semantic proximity
rather than factual reasoning. To probe this weakness, we employed GPT-4.1 to paraphrase
each correct completion, creating a rewritten test set that preserved factual truth
while masking lexical cues, and observed that over two-thirds of previously correct
predictions flipped to wrong, reducing overall accuracy to roughly one-third of its
original level. Taken together, these findings reveal a systematic trade-off introduced
by contrastive learning for retrievers: gains in semantic retrieval are paid for with
losses in parametric factual knowledge, and the resulting models remain highly vulnerable
to adversarial or even benign rephrasings. Our study underscores the need for retrieval
objectives that balance similarity with factual fidelity to safeguard next-generation
RAG systems against both misinformation and targeted attacks.
Dual Context-Aware Negative Sampling Strategy for Graph-based Collaborative Filtering
- Xi Wu
- Wenzhe Zhang
- Liangwei Yang
- Xiaohan Fang
- Jiquan Peng
- Jibing Gong
Negative sampling plays a critical role in collaborative filtering (CF), as it accelerates
convergence and improves recommendation accuracy. Among recent studies, mixup-based
negative sampling has shown promising performance. However, existing methods primarily
focus on increasing the similarity between the synthesized negative and the positive
item, without considering the false positive issue commonly found in implicit feedback
scenarios. Blindly training all positive samples with overly hard negatives can magnify
the impact of false positives and hurt recommendation performance. To address this
challenge, we first provide a theoretical analysis revealing that mixup-synthesized
hard negatives implicitly reweight the similarity difference between the user's interactions
and both the positive and negative boundaries, thereby shaping the training signal.
Motivated by this, we propose a novel strategy named Dual Context-Aware Negative Sampling
(DCANS), which enhances each positive item by assessing its alignment with the user's
interest context, and simultaneously adjusts the hardness of synthesized negatives
based on their relevance to the same interest context. This strategy optimizes the
training direction toward the user's genuine preferences, mitigating the negative
impact of false positives while preserving the benefits of hard negative sampling.
Extensive experiments on three benchmark datasets demonstrate that our method achieves
consistent improvements over state-of-the-art baselines. Our PyTorch implementation
is available https://github.com/Wu-Xi/DCANS.
Harnessing Light for Cold-Start Recommendations: Leveraging Epistemic Uncertainty
to Enhance Performance in User-Item Interactions
- Yang Xiang
- Li Fan
- Chenke Yin
- Menglin Kong
- Chengtao Ji
Most recent paradigms of generative model-based recommendation still face challenges
related to the cold-start problem. Existing models addressing cold item recommendations
mainly focus on acquiring more knowledge to enrich embeddings or model inputs. However,
many models do not assess the efficiency with which they utilize the available training
knowledge, leading to the extraction of significant knowledge that is not fully used,
thus limiting improvements in cold-start performance. To address this, we introduce
the concept of epistemic uncertainty (which refers to uncertainty caused by a lack
of knowledge of the best model) to indirectly define how efficiently a model uses
the training knowledge. Since epistemic uncertainty represents the reducible part
of the total uncertainty, we can optimize the recommendation model further based on
epistemic uncertainty to improve its performance. To this end, we propose a Cold-Start
Recommendation based on Epistemic Uncertainty (CREU) framework. Additionally, CREU
is inspired by Pairwise-Distance Estimators (PaiDEs) to efficiently and accurately
measure epistemic uncertainty by evaluating the mutual information between model outputs
and weights in high-dimensional spaces. The proposed method is evaluated through extensive
offline experiments on public datasets, which further demonstrate the advantages and
robustness of CREU. The source code is available at https://github.com/EsiksonX/CREU.
A Soft-partitioned Semi-supervised Collaborative Transfer Learning Approach for Multi-Domain
Recommendation
- Liu Xiaoyu
- Yiqing Wu
- Ruidong Han
- Fuzhen Zhuang
- Xiang Li
- Wei Lin
n industrial practice, Multi-domain Recommendation (MDR) plays a crucial role. Shared-specific
architectures are widely used in industrial solutions to capture shared and unique
attributes via shared and specific parameters. However, with imbalanced data across
different domains, these models face two key issues: (1) Overwhelming: Dominant domain
data skews model performance, neglecting non-dominant domains. (2) Overfitting: Sparse
data in non-dominant domains leads to overfitting in specific parameters. To tackle
these challenges, we propose Soft-partitioned Semi-supervised Collaborative Transfer
Learning (SSCTL) for multi-domain recommendation. SSCTL generates dynamic parameters
to address the overwhelming issue, thus shifting focus towards samples from non-dominant
domains. To combat overfitting, it leverages pseudo-labels with weights from dominant
domain instances to enhance non-dominant domain data. We conduct comprehensive experiments,
both online and offline, to validate the efficacy of our proposed method. Online tests
yielded significant improvements across various domains, with increases in GMV ranging
from 0.54% to 2.90% and enhancements in CTR ranging from 0.22% to 1.69%.
Exploring the Potential of Pre-Trained Language Models in Long-Term Semantic Scene
Change Prediction Using Variable Scene Graphs
- Haoyi Xiu
- Xin Liu
- Taehoon Kim
- Kyoung-Sook Kim
The 3D Variable Scene Graph (3DVSG) is a newly emerging representation for modeling
dynamic environments, extending scene graphs by introducing a node-level property
called variability, which quantifies the likelihood of semantic change over time.
In this work, we explore the integration of pre-trained language models (PLMs) into
variability estimation. This is of significant practical importance because variability
estimation suffers from data scarcity and severe class imbalance. PLMs provide a rich
general semantic knowledge that can enhance representation learning in such settings.
We systematically evaluate PLM embeddings across different graph neural networks (GNNs).
We introduce a template-based text structuring (TTS) to understand the effect of input
formatting. Our experiments show that PLM embeddings significantly improve variability
estimation performance, with effectiveness influenced by both embedding and GNN choices.
Also, we demonstrate that text structure can significantly affect embedding quality.
Lastly, we demonstrate that PLM embeddings yield reliable gains in variability estimation
and downstream active change detection.
Enhancing Graph Collaborative Filtering with FourierKAN Feature Transformation
- Jinfeng Xu
- Zheyu Chen
- Jinze Li
- Shuo Yang
- Wei Wang
- Xiping Hu
- Edith Ngai
Graph Collaborative Filtering (GCF) has emerged as a dominant paradigm in modern recommendation
systems, excelling at modeling complex user-item interactions and capturing high-order
collaborative signals. Most existing GCF models predominantly rely on simplified graph
architectures like LightGCN, which strategically remove feature transformation and
activation functions from vanilla graph convolution networks. Through systematic analysis,
we reveal that feature transformation in message propagation can enhance model representation,
though at the cost of increased training difficulty. To this end, we propose FourierKAN-GCF,
a novel framework that adopts Fourier Kolmogorov-Arnold Networks as efficient transformation
modules within graph propagation layers. This design enhances model representation
while decreasing training difficulty. Our FourierKAN-GCF can achieve higher recommendation
performance than most widely used GCF backbone models and can be integrated into existing
advanced self-supervised models as a backbone, replacing their original backbone to
achieve enhanced performance. Extensive experiments on three public datasets demonstrate
the superiority of FourierKAN-GCF.
Multi-Item-Query Attention for Stable Sequential Recommendation
- Mingshi Xu
- Haoren Zhu
- Wilfred Siu Hung Ng
The inherent instability and noise in user interaction data challenge sequential recommendation
systems. Prevailing masked attention models, relying on a single query from the most
recent item, are sensitive to this noise, reducing prediction reliability. We propose
the Multi-Item-Query attention mechanism (MIQ-Attn) to enhance model stability and
accuracy. MIQ-Attn constructs multiple diverse query vectors from user interactions,
effectively mitigating noise and improving consistency. It is designed for easy adoption
as a drop-in replacement for existing single-query attention. Experiments show MIQ-Attn
significantly improves performance on benchmark datasets.
In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
- Shangqing Xu
- Harshavardhan Kamarthi
- Haoxin Liu
- B. Aditya Prakash
Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities
across diverse datasets and tasks. However, existing foundation models are typically
pre-trained to enhance performance on specific tasks and often struggle to generalize
to unseen tasks without fine-tuning. To address this limitation, we propose augmenting
TSFMs with In-Context Learning (ICL) capabilities, enabling them to perform test-time
inference by dynamically adapting to input-output relationships provided within the
context. Our framework, In-Context Time-series Pre-training (ICTP), restructures the
original pre-training data to equip the backbone TSFM with ICL capabilities, enabling
adaptation to unseen tasks. Experiments demonstrate that ICT improves the performance
of state-of-the-art TSFMs by approximately 11.4% on unseen tasks without requiring
fine-tuning.
Multi-Behavior Intent Disentanglement for Recommendation via Information Bottleneck
Principle
- Tongxin Xu
- Chenzhong Bin
- Cihan Xiao
- Yunhui Li
- Tianlong Gu
In e-commerce, recommender systems help users find suitable products by leveraging
diverse behaviors, e.g., view, cart and buy. In recent years, multi-behavior recommender
systems have made strides by integrating auxiliary behaviors with purchase histories
to deliver high-quality recommendations. However, most existing methods often fail
to identify spurious correlation intents within auxiliary behaviors that conflict
with users' target intents. Indiscriminately incorporating such correlations into
the prediction of target intents may lead to performance degradation. Toward this
end, we propose a Multi-Behavior Intent Disentanglement (MBID) framework based on
Information Bottleneck (IB) principle, which focuses on disentangling spurious correlation
intents in multi-behavior recommendations. In particular, we design a projection-based
intent extraction method to decompose the genuine and spurious correlation intents
in auxiliary behaviors. Building on this, we conceive an IB-based multi-intent learning
task to disentangle the spurious correlation intents and transfer the genuine correlation
intents from auxiliary behaviors into the target behavior, yielding high-quality target
intent representations. Experiments on three real-world datasets show MBID significantly
outperforms the state-of-the-art baselines by effectively disentangling the spurious
correlation intents.
Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning
- Yifei Xu
- Jiaying Wu
- Herun Wan
- Yang Li
- Zhen Hou
- Min-Yen Kan
Hashtag trends ignite campaigns, shift public opinion, and steer millions of dollars
in advertising spend, yet forecasting which tag goes viral is elusive. Classical regressors
digest surface features but ignore context, while large language models (LLMs) excel
at contextual reasoning but misestimate numbers. We present BuzzProphet, a reasoning-augmented
hashtag popularity prediction framework that (1) instructs an LLM to articulate a
hashtag's topical virality, audience reach, and timing advantage; (2) utilizes these
popularity-oriented rationales to enrich the input features; and (3) regresses on
these inputs. To facilitate evaluation, we release HashView, a 7,532-hashtag benchmark
curated from social media. Across diverse regressor-LLM combinations, BuzzProphet
reduces RMSE by up to 2.8% and boosts correlation by 30% over baselines, while producing
human-readable rationales. Results demonstrate that using LLMs as context reasoners
rather than numeric predictors injects domain insight into tabular models, yielding
an interpretable and deployable solution for social media trend forecasting.
Measuring Uncertainty in Medical Image Diagnosis via Conformal Focal Loss
- Ao Yang
- Xiaodong Yue
- Yufei Chen
Medical image diagnosis inherently involves uncertainty due to artifacts, occlusions,
and ambiguous visual patterns, often leading to high inter-observer variability. While
deep neural networks offer strong predictive performance, their outputs tend to be
overconfident and poorly calibrated, limiting their clinical reliability. We propose
Conformal Focal Loss (CFL), a principled approach that leverages the focal loss and
the statistical validity of conformal prediction to better characterize diagnostic
uncertainty. By emphasizing hard or ambiguous examples, CFL enables more accurate
estimation of both predictive confidence and ambiguity. We evaluate CFL on diagnostic
tasks using both clean and noise-augmented datasets, demonstrating its ability to
effectively identify uncertain cases while maintaining robust classification performance
under label noise.
Advanced News Event Clustering via Topic Enhanced Modeling with Multi-Aspect Contrastive
Learning
- Hang Yang
- Xiaoyan Yu
- Dianbo Sui
News event clustering, a crucial task for discovering and comprehending real-world
information, aims to aggregate news articles into fine-grained clusters based on specific
key events. As the presence of topic-unrelated documents within clusters and redundant
information within individual documents, it is challenging to learn a discriminative
document representation. To address this issue, we introduce a novel method, TECL
(Topic Enhanced modeling with Contrastive Learning), that leverages topic enhanced
modeling with multi-aspect contrastive learning for news event clustering. The topic-enhanced
modeling employs neural topic models to incorporate global and local semantics into
document representations, while contrastive learning refines these representations
from both inter-document and intra-document perspectives.Experiments conducted in
both unsupervised and supervised scenarios indicate that the proposed method significantly
improves performance, demonstrating its effectiveness.
HCLeK: Hierarchical Compression of Legal Knowledge for Retrieval-Augmented Generation
- Jianhui Yang
- Huanghai Liu
- Mingruo Yuan
- YiRan Hu
- Yun Liu
- Weixing Shen
- Ben Kao
Prompt compression for Retrieval-Augmented Generation (RAG) often fails by treating
all retrieved information uniformly. This undifferentiated approach neglects the critical
distinction between foundational core knowledge and illustrative practical knowledge,
a failure especially damaging in hierarchical domains like law where essential principles
can be discarded for redundant details, diminishing information gain.
To address this, we propose HCLeK, a Hierarchical Compression framework for Legal Knowledge. HCLeK uniquely leverages
high-density core knowledge to guide the hierarchical compression of voluminous practical
knowledge. The framework operates in three stages: (1) Core-Knowledge Guided Reranking to prioritize practical knowledge based on its semantic relevance to core legal principles;
(2) Priority-Decay Budget Allocation to dynamically assign compression budgets, focusing on the most salient information;
and (3) Relevance-Diversity Aware Semantic Compression for fine-grained sentence-level compression. Experimental results on the complex
task of Legal Judgment Prediction (LJP) validate that HCLeK achieves state-of-the-art
performance across various high compression ratios (0.5--0.05), demonstrating its
effectiveness and robustness. Our code is available at https://github.com/fupanY/HCLeK.
USE-LTV: Customer LifeTime Value Prediction via Uncertain Sequence Modeling in Baidu
Ads
- Lei Yang
- Jiahui Zhang
- Guoyu Liu
- Houzhi Wang
- Zhiyuan Zhou
- Xiaohui Zhao
Accurate customer LifeTime Value (LTV) predictions are of critical importance for
evaluating the efficiency of customer management strategies, which could enhance advertising
placement for better decision-making in ad systems. However, existing solutions for
LTV prediction usually rely on determined historical sequence data, which are challenging
to apply in Baidu ads due to two unique features including (i) uncertain behavior
sequence of customers caused by dynamic advertising strategies, (ii) complex and long-tail
distribution of LTV caused by continuous customer behaviors and unique business pattern
of search and news feed ads in Baidu. To incorporate these new factors, we propose
an Uncertain behavior Sequence modeling framework to predict customer LifeTime Value (USE-LTV), where we (i) utilize a transformer module to extract uncertain sequence features,
and develop a dynamic weight mechanism to capture differentiated information under
uncertain behavior sequence, (ii) design an continuous loss function tailored to the
real-world long-tail exponential LTV distribution in Baidu ads. We extensively evaluate
our method based on the industrial-scale real-world data from Baidu, one of the world's
largest ads platform, demonstrate that USE-LTV achieves 11.64% of NMAE improvement
for a-year LTV compared to the state-of-the-art method.
RepMedGAN: Self-supervised Representation-guided Medical GAN for Label-free Medical
Image Synthesis
- Yuchong Yao
- Nandakishor Desai
- Marimuthu Palaniswami
Medical image synthesis addresses healthcare data scarcity by generating realistic
samples for clinical support systems, AI training, and research. However, the field
faces challenges due to the complexity of imaging data with its diverse modalities,
characteristics, and disease variations. To produce high-quality images, medical image
synthesis typically relies on conditional generation, where labels and annotations
serve as essential conditions that provide critical guidance signals during the generation
process to control desired semantics and fidelity. However, in the medical domain,
labels are often inaccessible due to the high cost of annotation, requirements for
clinical expertise, as well as ethical concerns. To address this critical challenge,
we propose RepMedGAN, a novel self-supervised representation-guided image generation
framework that enhances label-free medical image synthesis by leveraging self-supervised
learning representations, enabling high-quality generation across different modalities
without requiring labels or annotations. Our framework incorporates a Self-supervised
Guidance Module that provides rich semantic knowledge during training and introduces
a Guidance Representation Generator to bridge the train-inference disparity. Through
extensive evaluation across four diverse medical datasets including brain MRI, chest
X-ray, kidney CT, and eye glaucoma images, we demonstrate that RepMedGAN consistently
achieves state-of-the-art results across multiple metrics and produces superior-quality
medical images.
Rethinking Masked Image Modeling for Ultrasound Image Denoising
- Yuchong Yao
- Nandakishor Desai
- Marimuthu Palaniswami
Ultrasound imaging serves as an important clinical diagnostic modality due to its
non-invasive, radiation-free, and real-time capabilities. However, ultrasound images
suffer from speckle noise that significantly compromises diagnostic accuracy and clinical
interpretation. Traditional denoising methods are limited by speckle noise's signal-dependent
nature, often removing important diagnostic features. While deep learning performs
better, it requires large labelled datasets that are difficult to obtain due to privacy
concerns and annotation costs. Self-supervised learning through masked image modeling
(MIM) shows potential in addressing data scarcity, but conventional MIM, developed
for high-level vision tasks, is unsuitable for low-level tasks like image denoising
due to its framework architecture and learning strategy. To this end, we propose Image
Denoising Masked Image Modeling (ID-MIM), the first MIM framework for ultrasound image
denoising. ID-MIM incorporates a novel high-frequency oriented dual-branch masking
and a specialized learning objective for noise reduction. Our encoder-only architecture
features a multi-scale hierarchical transformer with dynamic skip connections, where
the encoder directly performs denoising rather than relying on separate decoder reconstruction
as in conventional MIM approaches. Extensive experiments demonstrate the superior
performance of our ID-MIM framework across diverse noise scenarios, establishing new
state-of-the-art results.
Can LLMs Really Help Query Understanding In Web Search? A Practical Perspective
- Dezhi Ye
- Ye Qin
- Bowen Tian
- Jiabin Fan
- Jie Liu
- Haijin Liang
- Jin Ma
As a core module of web search, query understanding aims to bridge the semantic gap
between user queries and web page documents, thereby enhancing the ability to deliver
more relevant results. Recently, Large Language Models (LLMs) have achieved significant
breakthroughs that have fundamentally altered the workflow of existing search ranking
tasks. However, few researchers have explored the integration of LLMs into the field
of query understanding. In this paper, we investigate the potential of LLMs in query
understanding by conducting a comprehensive evaluation across three dimensions: term,
structure, and topic. This evaluation includes several representative tasks such as
segmentation, term weighting, error correction, query expansion, and intent recognition.
The experimental results reveal that LLMs are particularly effective in query expansion
and intent recognition but show limited improvement in other areas. This limitation
may be attributed to LLMs' primary focus on modeling the semantic knowledge of entire
queries, while lacking the capability to capture token-level information with finer
granularity. Additionally, we explore potential practical applications of LLMs in
query understanding, such as integrating the evaluation and training capabilities
of smaller models with LLMs and constructing unsupervised samples. Based on comprehensive
empirical results, collaborative training emerges as a promising approach to leverage
LLMs for query understanding. We hope this research will advance the practical application
of LLMs in query understanding and contribute to the development of this field.
AR2: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
- Cheng-Kai Yeh
- Hsing-Wang Lee
- Chung-Hung Kuo
- Hen-Hsen Huang
Abstraction--the ability to recognize and distill essential computational patterns
from complex problem statements--is a foundational skill in computer science, critical
both for human problem-solvers and coding-oriented large language models (LLMs). Despite
recent advances in training LLMs for code generation using reinforcement learning
(RL), most existing approaches focus primarily on superficial pattern recognition,
overlooking explicit training for abstraction. In this study, we propose AR2 (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly
designed to enhance the abstraction abilities of LLMs. AR2 employs a teacher model to transform kernel problems into narrative-rich, challenging
descriptions without changing their fundamental logic. Simultaneously, a student coding
model is trained to solve these complex narrative problems by extracting their underlying
computational kernels. Experimental results demonstrate that AR2 substantially improves the student model's accuracy on previously unseen, challenging
programming tasks, underscoring abstraction as a key skill for enhancing LLM generalization.
MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization
Recent advances in multimodal learning have largely relied on pairwise contrastive
objectives to align different modalities, such as text, video, and audio, in a shared
embedding space. While effective in bi-modal setups, these approaches struggle to
generalize across multiple modalities and often lack semantic structure in high-dimensional
spaces. In this paper, we propose MOVER, a novel framework that combines optimal transport-based
soft alignment with volume-based geometric regularization to build semantically aligned
and structured multimodal representations. By integrating a transport-guided matching
mechanism with a geometric volume minimization objective (GAVE), MOVER encourages
consistent alignment across all modalities in a modality-agnostic manner. Experiments
on text-video-audio retrieval tasks demonstrate that MOVER significantly outperforms
prior state-of-the-art methods in both zero-shot and finetuned settings. Additional
analysis shows improved generalization to unseen modality combinations and stronger
structural consistency in the learned embedding space.
LLM-based Interactive Coding Education via Predictive Query Management and Student-Centered
Fine-Tuning: Design and Implementation with 1500-Student Class Data
- Geonjae Youn
- Jonghoon Lee
- Joongheon Kim
- Chuck Yoo
Large-scale university courses face significant challenges. Teaching assistants are
overwhelmed by the large number of student questions, limiting their ability to provide
detailed and individualized support. As a result, students-especially those who are
struggling-receive less tailored assistance, further widening gaps in academic performance.
To address these challenges, we propose student-centered AI learning assistant (SCALA),
a large language model (LLM)-based interactive tutoring system that incorporates student
needs and learning expectations. SCALA consists of two main components, i.e., predictive
query management and student-centered fine tuning. The first component anticipates
common student questions via LLM agent debate. Each agent interacts using a combination
of lecture content and student interactions in chat logs, collaboratively predicting
what the students will likely ask. This fosters learning in students by generating
and presenting relevant queries that guide their learning. On the other hand, the
second component is fine-tuned on a 14k-question Python-tutoring dataset, curated
based on in-depth student interviews to reflect real learning expectations. Our real-world
experiments with 1500-student large-scale Python classes demonstrate that SCALA delivers
more helpful and accurate responses compared to closed-form models (e.g., GPT-4o),
while significantly reducing latency.
CPSRank: Unsupervised Keyphrase Extraction via Contextual Perturbation
- Hyunwook Yu
- Minju Kim
- Euijin Kim
- Mucheol Kim
The importance of a phrase within a document becomes most evident through its absence
rather than its presence. Inspired by this observation, we redefine keyphrases as
those whose removal most disrupts the document's meaning. Traditional unsupervised
methods typically rely on document-level signals, such as term frequency or phrase-to
document similarity, which overlook the contextual contribution of a phrase. This
paper proposes CPSRank, an unsupervised keyphrase extraction method that evaluates
the semantic importance of candidate phrases via a contextual perturbation score (CPS).
The CPS quantifies the critical role of each phrase by combining contextual perturbation
and content loss. CPSRank outperforms existing baselines in terms of F1 scores while
providing deeper insights into the semantic value of keyphrases. We release our code
at https://github.com/Splo2t/CPSRank.
Query, Decompose, Compress: Structured Query Expansion for Efficient Multi-Hop Retrieval
Large Language Models (LLMs) have been increasingly employed for query expansion.
However, their generative nature often undermines performance on complex multi-hop
retrieval tasks by introducing irrelevant or noisy information. To address this challenge,
we propose DeCoR (Decompose and Compress for Retrieval), a framework grounded in structured
information refinement. Rather than generating additional content, DeCoR strategically
restructures the query's underlying reasoning process and distills supporting evidence
from retrieved documents. It consists of two core components tailored to the challenges
of multi-hop retrieval: (1) Query Decomposition, which decomposes a complex query
into explicit reasoning steps, and (2) Query-aware Document Compression, which synthesizes
dispersed evidence from candidate documents into a concise summary relevant to the
query. This structured design ensures that the final query representation remains
both robust and comprehensive. Experimental results demonstrate that, despite utilizing
a relatively small LLM, DeCoR outperforms strong baselines that rely on larger models.
This finding underscores that, in complex retrieval scenarios, sophisticatedly leveraging
the reasoning and summarization capabilities of LLMs offers a more efficient and effective
solution than relying solely on their generative capability.
Robust Handwritten Text Recognition via Multi-Source Adversarial Domain Adaptation
for Low-Resource Scripts
- Bustami Yusuf
- Hen-Hsen Huang
Transformer-based Optical Character Recognition (OCR) models perform well on common
tasks but struggle to generalize to low-resource settings, handwritten text, or diverse
scripts such as Jawi. Traditional adaptation methods often require labeled target
data or fall short under severe domain shifts. We tackle this challenge with Unsupervised
Multi-Source Heterogeneous Domain Adaptation (UMSHDA) for Handwritten Text Recognition
(HTR). We adapt Domain-Adversarial Neural Networks (DANN) and Prototype Learning for
Adversarial Domain Adaptation (PLADA) within a Transformer-based OCR architecture.
We introduce two novel multi-adversarial adaptation strategies: (1) an Additive strategy
that jointly optimizes a domain classification loss and a prototype-based alignment
loss, and (2) an Integrative strategy that uses a prototype-driven adversarial signal
to augment the domain classifier with semantic constraints. Experimental results show
that our methods significantly outperform non-adaptive and DANN baselines on challenging
handwriting tasks. For Jawi handwriting, our models achieve a relative error reduction
of up to 22%. We also demonstrate superior cross-script generalization to German handwriting,
achieving near-perfect performance, with a CER reduced by more than 50% relative to
the DANN baseline. This work demonstrates that tailored multi-adversarial domain adaptation
can effectively bridge significant domain gaps, enabling robust recognition accuracy
for complex, low-resource HTR.
Ultra Fast Warm Start Solution for Graph Recommendations
- Viacheslav Yusupov
- Maxim Rakhuba
- Evgeny Frolov
In this work, we present a fast and effective Linear approach for updating recommendations
in a scalable graph-based recommender system UltraGCN. Solving this task is extremely
important to maintain the relevance of the recommendations under the conditions of
a large amount of new data and changing user preferences. To address this issue, we
adapt the simple yet effective low-rank approximation approach to the graph-based
model. Our method delivers instantaneous recommendations that are up to 30 times faster
than conventional methods, with gains in recommendation quality, and demonstrates
high scalability even on the large catalogue datasets.
Uncertainty Quantification for Multiple-Choice Questions is Just One-Token Deep
- Qingcheng Zeng
- Mingyu Jin
- Qinkai Yu
- Zhenting Wang
- Wenyue Hua
- Guangyan Sun
- Yanda Meng
- Shiqing Ma
- Qifan Wang
- Felix Juefei-Xu
- Fan Yang
- Kaize Ding
- Ruixiang Tang
- Yongfeng Zhang
Multiple-choice question (MCQ) benchmarks such as MMLU and GPQA are widely used to
assess the capabilities of large language models (LLMs). While accuracy remains the
standard evaluation metric, recent work has introduced uncertainty quantification
(UQ) methods, such as entropy, conformal prediction, and verbalized confidence, as
complementary measures of model reliability and calibration. However, we find that
these UQ methods, when applied to MCQ tasks, are unexpectedly fragile. Specifically,
we show that fine-tuning a model on just 1,000 examples to adjust the probability
of the first generated token, under the common prompting setup where the model is
instructed to output only a single answer choice, can systematically distort a broad
range of UQ methods across models, prompts, and domains, all while leaving answer
accuracy unchanged. We validate this phenomenon through extensive experiments on five
instruction-tuned LLMs, tested under standard prompting, zero-shot chain-of-thought
reasoning, and a biomedical question answering setting. In all cases, models retain
similar accuracy but exhibit significantly degraded calibration. These results suggest
that current UQ practices for MCQs are ''one-token deep'', driven more by first-token
decoding behavior than by any deeper representation of uncertainty, and are easily
manipulated through minimal interventions. Our findings call for more robust and interpretable
approaches to uncertainty estimation, particularly in structured formats like MCQs,
where confidence signals are often reduced to token-level heuristics.
PMTA: Perception-Aware Multi-Task Transformer Network for Personalized Multi-Domain
Adaptation
- Chenbin Zhang
- Xiaoxie Zhu
- Xingchao Cao
- Qiwei Chen
- Feng Zhang
- Yang Xiao
- Zuotao Liu
The escalating complexity of industrial recommendation systems, characterized by diverse
user behaviors and cross-domain application scenarios, necessitates advanced multi-task
and multi-domain learning paradigms. Existing methods often struggle with efficient
knowledge transfer across tasks and domains due to semantic gaps and distribution
shifts. To address these challenges, we propose the Perception-Aware Multi-Task Transformer Network for Personalized Multi-Domain Adaptation (PMTA), a unified framework that integrates three key innovations: First, the Task Prompt Encoding (TPE) module dynamically generates prompts by synthesizing personalized user data with
task-specific information. Second, the Transformer-based Multi-Task Perception (TMPN) network enables adaptive cross-task knowledge transfer through attention mechanisms.
Third, the Multi-Domain Adaptation (MDAN) component captures domain-specific behavior patterns via learnable prior information.
Experimental results demonstrate PMTA's effectiveness, achieving 0.168% increase in watch time and significant improvements in engagement metrics (AAD: +0.0113%, AAH: +0.0608%). Deployed on Douyin and Douyin Lite, it significantly improves recommendation quality
and drives commercial success.
PrLM: Learning Explicit Reasoning for Personalized RAG via Contrastive Reward Optimization
- Kepu Zhang
- Teng Shi
- Weijie Yu
- Jun Xu
Personalized retrieval-augmented generation (RAG) aims to produce user-tailored responses
by incorporating retrieved user profiles alongside the input query. Existing methods
primarily focus on improving retrieval and rely on large language models (LLMs) to
implicitly integrate the retrieved context with the query. However, such models are
often sensitive to retrieval quality and may generate responses that are misaligned
with user preferences. To address this limitation, we propose PrLM, a reinforcement
learning framework that trains LLMs to explicitly reason over retrieved user profiles.
Guided by a contrastively trained personalization reward model, PrLM effectively learns
from user responses without requiring annotated reasoning paths. Experiments on three
personalized text generation datasets show that PrLM outperforms existing methods
and remains robust across varying numbers of retrieved profiles and different retrievers.
Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework
- Moyu Zhang
- Yujun Jin
- Jinxin Hu
- Yu Zhang
Current recommendation methods typically use a unified framework to offer personalized
recommendations for different scenarios provided by commercial platforms. However,
they often employ shared bottom representations, which partially hinders the model's
capacity to capture scenario uniqueness. Ideally, users and items should exhibit specific
characteristics in different scenarios, prompting the need to learn scenario-specific
representations to differentiate scenarios. Yet, variations in user and item interactions
across scenarios lead to data sparsity issues, impeding the acquisition of scenario-specific
representations. To learn robust scenario-specific representations, we introduce a
Global-Distribution Aware Scenario-Specific Variational Representation Learning Framework
(GSVR) that can be directly applied to existing multi-scenario methods. Specifically,
considering the uncertainty stemming from limited samples, our approach employs a
probabilistic model to generate scenario-specific distributions for each user and
item in each scenario, estimated through variational inference (VI). Additionally,
we introduce the global knowledge-aware multinomial distributions as prior knowledge
to regulate the learning of the posterior user and item distributions, ensuring similarities
among distributions for users with akin interests and items with similar side information.
This mitigates the risk of users or items with fewer records being overwhelmed in
sparse scenarios. Extensive experimental results affirm the efficacy of GSVR in learning
more robust representations.
SAKG: Structure-Aware Large Language Model Framework for Knowledge Graph Reasoning
- Qingyu Zhang
- Min Hu
- Wenlong Fei
- Jiaoyun Yang
- Hongbo Li
Existing approaches to leveraging knowledge graphs in large language models often
lack explicit structural modeling, which can lead to hallucinations and unstable reasoning
over graph data. To address this, we propose SAKG, a structure-aware prompting framework
designed to enhance the alignment between knowledge graph representations and language
model inference. SAKG employs a hierarchical prompting strategy that integrates explicit
task instructions, structural embeddings, and selectively filtered neighbor context.
In particular, we introduce a progressive neighbor selection mechanism that combines
relation co-occurrence statistics with embedding-based semantic similarity, ensuring
that only informative and relevant neighbors are included in the prompt. This design
enables the model to better capture relational semantics, structural dependencies,
and contextual cues within the graph. Experimental results on multiple benchmarks
demonstrate that SAKG consistently improves the effectiveness and factual consistency
of knowledge graph reasoning with large language models.
Relation-Sensitive Visual Aggregation Enhances Multimodal Knowledge Graph Completion
- Qingyu Zhang
- Min Hu
- Wenlong Fei
- Jiaoyun Yang
- Hongbo Li
Existing multimodal knowledge graph completion methods often overlook triple correlations
between relations and images, limiting the expressiveness of multimodal embeddings.
In this paper, we categorize triple correlations into Intra-triple Correlations (IaC)
and Inter-triple Correlations (IeC), and propose a method called Relation-Sensitive
Visual Aggregation (RSVA) to explicitly model them. Specifically, RSVA consists of
two modules. The first is the Visual Semantic Aggregation Module, aggregating visual
features for the central entity considering IaC. The second is the Contextual Neighbor
Aggregation Module, capturing IeC by aggregating visual semantics from neighboring
entities. In link prediction experiments, RSVA demonstrates the effect of IaC and
IeC on the embeddings of central entities and achieves improved performance compared
with previous approaches. These results demonstrate the effectiveness of RSVA, indicating
that explicitly modeling the latent correlations between relations and images can
enhance the representational capability of multimodal knowledge graphs.
Interpretable Meta-weighting Sparse Neural Additive Networks for Datasets with Label
Noise and Class Imbalance
- Xuelin Zhang
- Hong Chen
- Lingjuan Wu
Black-box neural networks are inherently inscrutable, and their widespread use has
triggered significant societal issues in crucial areas such as healthcare, finance
and safety. In these high-stakes decision-making domains, the deployment of machine
learning algorithms requires not only prediction accuracy but also their interpretability
and robustness against data distribution shifts, such as outliers, label noise, and
category imbalance. In this work, we propose a novel Meta-weighted Sparse Neural Additive
Model (MSpNAM), which offers robustness through an efficient bilevel weighting policy
and inherits strong explainability and representation capabilities from the additive
modeling strategy. Furthermore, empirical results across multiple synthetic and real
datasets, under various distribution shifts, demonstrate that MSpNAM can scale effectively
and achieve superior performance in terms of robustness, interpretability, and anti-forgetting
compared to some of the latest baselines.
An Robust Entity Alignment Method based on Knowledge Distillation with Noisy Aligned
Pairs
- Yuhong Zhang
- Hangchi Song
- Xiaolong Zhu
- Chenyang Bu
- Kui Yu
Entity alignment (EA) aims to find the same entities in different knowledge graphs.
Existing EA methods assume the supervised aligned pairs without noise. In applications,
noisy pairs lead to degradation of EA performance. To this end, a robust EA method
based on knowledge distillation is proposed for noisy pairs. Firstly, the dual-teacher
model with online distillation is designed, in which, noise discriminator is performed
to improve the noise resistance of teacher models. Secondly, a student model is offline
distilled from the dual-teacher model without using the noisy supervised pairs, further
enhancing the robustness of student model. In addition, the entity structure is combined
with entity representation for alignment inference to alleviate the bias of entity
representation in noisy environment. Extensive experiments demonstrate the effectiveness
of the proposed method.
Student-Augmented Self-Training with Closed Loop Feedback in Linguistic Steganalysis
- Ziwei Zhang
- Juan Wen
- Wanli Peng
- Haowei Chang
As a countermeasure to linguistic steganography, linguistic steganalysis aims to distinguish
between texts containing hidden secret messages (stego) and natural texts (cover).
Current semi-supervised steganalysis approaches rely on a self-training mechanism.
However, in linguistic steganalysis, pseudo-label errors propagate through iterative
training, causing the student model to reinforce incorrect stego-distribution associations,
thereby impairing its discriminative ability for subtle linguistic perturbations.
To address this challenge, we propose SALT-LS, a self-training linguistic steganalysis framework that integrates a closed feedback
loop between student and teacher models alongside a dual-constraint mechanism to improve
pseudo-labels. Unlike a conventional semi-supervised steganalysis approach, we compute
prototype penalties from both same-class (rather than intra-class) and cross-class
perspectives, enabling more effective use of labeled data. Furthermore, we introduce
an advanced-updating strategy for the student model, which is combined with the dual-constraint
mechanism, forming a closed feedback loop that continuously refines the teacher model's
pseudo-label generation for robust steganalysis performance. Extensive experiments
on six datasets, including widely used steganographic strategies and corpora, demonstrate
that SALT-LS outperforms state-of-the-art models. Our code is available.
Non-autoregressive Generative Auction with Global Externalities for Online Advertising
- Zuowu Zheng
- Ze Wang
- Fan Yang
- Wenqing Ye
- Weihua Huang
- Wenqiang He
- Teng Zhang
- Xingxing Wang
Online advertising auctions play a critical role in internet commerce, requiring mechanisms
that maximize revenue while ensuring incentive compatibility, user experiences, and
real-time efficiency. Existing learning-based auction frameworks advance contextual
modeling by considering intra-list dependencies among ads, but still face challenges
of insufficient global externality modeling and inefficiencies due to sequential processing.
In this paper, we propose the Non-autoregressive Generative Auction with global externalities
(NGA), a novel end-to-end auction framework for industrial online advertising. NGA
explicitly models global externalities by jointly encoding dependencies among ads
and the influence of neighboring organic content. To achieve real-time efficiency,
NGA employs a non-autoregressive, constraint-based decoding mechanism and a parallel
multi-tower evaluator that unifies list-wise reward and payment computation. Extensive
offline experiments and large-scale online A/B tests on commercial advertising platforms
demonstrate that NGA achieves superior performance in both effectiveness and efficiency
compared to the state-of-the-art baselines.
MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation
- Youran Zhou
- Mohamed Reda Bouadjenek
- Sunil Aryal
Diffusion models have recently emerged as powerful tools for missing data imputation
by modeling the joint distribution of observed and unobserved variables. However,
existing methods, typically based on stochastic denoising diffusion probabilistic
models (DDPMs), suffer from high inference latency and variable outputs, limiting
their applicability in real-world tabular settings. To address these deficiencies,
we present in this paper MissDDIM, a conditional diffusion framework that adapts Denoising
Diffusion Implicit Models (DDIM) for tabular imputation. While stochastic sampling
enables diverse completions, it also introduces output variability that complicates
downstream processing. MissDDIM replaces this with a deterministic, non-Markovian
sampling path, yielding faster and more consistent imputations. To better leverage
incomplete inputs during training, we introduce a self-masking strategy that dynamically
constructs imputation targets from observed features-enabling robust conditioning
without requiring fully observed data. Experiments on five benchmark datasets demonstrate
that MissDDIM matches or exceeds the accuracy of state-of-the-art diffusion models,
while significantly improving inference speed and stability. These results highlight
the practical value of deterministic diffusion for real-world imputation tasks.
Adversarially Attacking Graph Properties and Sparsification in Graph Learning
- Chunjiang Zhu
- Blake Gaines
- Jing Deng
- Jinbo Bi
Graph neural networks and graph transformers explicitly or implicitly rely on fundamental properties of the underlying graph, such as spectral properties
and shortest-path distances. However, it is still not clear how these graph properties
are vulnerable to adversarial attacks and what impacts this has on the downstream
graph learning. Moreover, while graph sparsification has been used to improve computational
cost of learning over graphs, its susceptibility to adversarial attacks has not been
studied. In this paper, we study adversarial attacks on graph properties and graph
sparsification and their impacts on downstream graph learning, paving the way for
how to protect against these potential attacks. Our proposed methods are effective
in attacking spectral properties, shortest distances, and graph sparsification as
demonstrated in our experimental evaluation.
Asymmetric Diffusion Recommendation Model
- Yongchun Zhu
- Guanyu Jiang
- Jingwu Chen
- Feng Zhang
- Xiao Yang
- Zuotao Liu
Recently, motivated by the outstanding achievements of diffusion models, the diffusion
process has been employed to strengthen representation learning in recommendation
systems. Most diffusion-based recommendation models typically utilize standard Gaussian noise in symmetric forward and reverse processes in continuous data space. Nevertheless, the samples
derived from recommendation systems inhabit a discrete data space, which is fundamentally
different from the continuous one. Moreover, Gaussian noise has the potential to corrupt
personalized information within latent representations. In this work, we propose a
novel and effective method, named Asymmetric Diffusion Recommendation Model (AsymDiffRec), which learns forward and reverse processes in an asymmetric manner. We define a
generalized forward process that simulates the missing features in real-world recommendation
samples. The reverse process is then performed in an asymmetric latent feature space.
To preserve personalized information within the latent representation, a task-oriented
optimization strategy is introduced. In the serving stage, the raw sample with missing
features is regarded as a noisy input to generate a denoising and robust representation
for the final prediction. By equipping base models with AsymDiffRec, we conduct online
A/B tests, achieving improvements of +0.131% and +0.166% in terms of users' active
days and app usage duration respectively. Additionally, the extended offline experiments
also demonstrate improvements. AsymDiffRec has been implemented in the Douyin Music
App.
Active Recommendation for Email Outreach Dynamics
- Čeněk Žid
- Rodrigo Alves
- Pavel Kordík
Email outreach remains a cornerstone of modern marketing, enabling direct, timely
communication. However, this strategy faces significant personalization challenges,
since new campaigns typically lack historical interaction data and rich side information.
In this work, we propose a framework that combines collaborative-filtering (CF) signals
derived from a shallow autoencoder (SAE) with a Thompson Sampling-based multi-armed
bandit to dynamically select small batches of recipients for each email template.
We show SAEs help balance exploration and exploitation by quantifying recipient informativeness
and confidence, enabling efficient personalization without retraining during active
learning. To facilitate reproducibility and future research, we release a large dataset
of almost 15 million recipient-message interactions, offering new insights into email
outreach dynamics for CF. Our experiments show that our method outperforms multiple
baselines in retrieval metrics while retaining interpretable model components.
Structuring Video Semantics with Temporal Triplets for Zero-Shot Video Question Answering
- Linlin Zong
- Xinyu Zhai
- Xinyue Liu
- Wenxin Liang
- Xianchao Zhang
- Bo Xu
Current large vision-language models (VLMs) exhibit remarkable performance in basic
video understanding tasks. However, existing VLMs are still limited to surface-level
perception and lack fine-grained spatio-temporal understanding and combinatorial reasoning
capabilities. Existing methods typically rely on expensive human annotations or subtitle
extraction, yet they struggle to effectively model temporal relations between frames.
This paper proposes a structured representation based on temporal triplets to address
two major challenges in traditional approaches: temporal fragmentation and entity
reference ambiguity. By modeling objects, attributes, and relationships within the
video and incorporating temporal information, we convert semantic content from keyframes
into a sequence of temporal triplets. This structured representation is then used
as input for zero-shot video question answering (VideoQA). Experiments were conducted
on four benchmark VideoQA datasets: NExT-QA, STAR, MSVD-QA, and MSRVTT-QA, showing
that our method achieves competitive performance without requiring fine-tuning, validating
its generality and effectiveness.
SESSION: Applied Research Papers
EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs
- Tushar Abhishek
- Manas Jain
- Shishir Hardia
- Shreevignesh Suriyanarayanan
- Sandra Anil
- Rushabh Gandhi
- Manish Gupta
Short and ambiguous queries in chat interfaces like Microsoft Copilot often lead to
vague or irrelevant LLM responses, increasing task completion time. Hence, we introduce
a novel problem: semi-automatically enhancing such queries/prompts into specific,
well-formed ones with clear intent. Unlike prompt optimization, our approach adds
relevant sub-intents or constraints rather than just rewording for brevity. We propose
EnhanceMyPrompt, which uses small language models (SLMs) to enrich prompts by adding sub-intents/constraints,
suggesting placeholders, and recommending popular values. We also introduce metrics
to measure prompt improvement, user effort, and LLM response quality. Experiments
on a proprietary Microsoft Copilot and LMSYS+NQ datasets with four SLMs show effectiveness:
EnhanceMyPrompt predicts user intents up to 3 turns ahead in ~23% of conversations, enabling efficient
sessions. Code, prompts, data, and models for LMSYS+NQ are publicly available.
FR-LoRA: Fisher Regularized LoRA for Multilingual Continual Learning
- Sayanta Adhikari
- Sanjay Agrawal
- Vivek Sembium
Relevance in e-commerce product search is critical to ensuring that results accurately
reflect customer intent. While large language models (LLMs) have recently advanced
natural language processing capabilities, their high inference latency and significant
infrastructure demands make them less suitable for real-time e-commerce applications.
Consequently, transformer-based encoder models are widely adopted for relevance classification
tasks. These models typically evaluate the relevance of a product to a given query
by encoding the query and product title as input features. As e-commerce stores expand
into new marketplaces, the need for language- and region-specific relevance models
grows, often resulting in the sequential development and maintenance of separate models
per marketplace. To address this challenge, we introduce a multilingual continual
learning (CL) framework that mitigates catastrophic forgetting. Our proposed method,
FR-LoRA (Fisher Regularized LoRA), integrates Elastic Weight Consolidation (EWC) with marketplace-specific
LoRA modules, where each LoRA is regularized using the Fisher information matrix.
FR-LoRA retains the same inference-time footprint as the base model, ensuring zero
additional latency while enabling frequent, scalable updates. Empirically, our approach
achieves a ~3% ROC-AUC improvement over single-marketplace baselines and outperforms several recent CL baselines on
both proprietary and public datasets.
Uncovering Corporate Influence: A First Scalable Method for Qualifying Holdings Computation
- Livia Blasi
- Matteo Brandetti
- Costanza Catalano
- Andrea Gentili
- Davide Magnanimi
Distributed logic-based reasoning has recently emerged as a powerful, explainable,
and auditable approach for solving complex problems across a wide range of domains.
Based on our experience in building company ownership graphs at the Bank of Italy,
this paper presents a distributed reasoning solution to the qualifying holding problem.
Qualifying holdings measure an entity's influence over a bank or financial intermediary
and serve as a key metric in banking supervision. Their calculation is particularly
challenging due to the large scale of real-world ownership networks and their structural
complexity---the latter often not fully addressed by existing regulatory frameworks.
To our knowledge, no standardised, efficient computational approach that also faithfully
reflects regulatory interpretations has ever been proposed, leaving a significant
gap between legal requirements and practical capabilities. We fill this gap by proposing
the first mathematical formalisation of the qualifying holding problem, proving it
to be inherently hard (#P-complete). Despite its general intractability, we develop
a logic-based reasoning algorithm based on Datalog± that enables efficient and parallel
computation of qualifying holdings across real-world scenarios. Extensive experiments
confirm its practical effectiveness and scalability.
Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization
- Jiangxia Cao
- Pengbo Xu
- Yin Cheng
- Kaiwei Guo
- Jian Tang
- Shijun Wang
- Dewei Leng
- Shuang Yang
- Zhaojie Liu
- Yanan Niu
- Guorui Zhou
- Kun Gai
To provide promising recommendation results, there exist three major stages in the
industrial RecSys chain to support our service: (1) The first Retrieval model aims
at searching hundreds of item candidates. (2) Next, the Ranking model estimates the
multiple aspect probabilities Pxtrs for each retrieved item. (3) At last, the Ensemble
Sort stage merges those Pxtrs into one comparable score, and then selects the best
dozen items with the highest scores to recommend them. To our knowledge, the wide-accepted
industry ensemble sort approach still relies on manual formula-based adjustment, i.e., assigning manual weights for Pxtrs to control its influence on fusion score.
Under this framework, the RecSys severely relies on expert knowledge to determine
satisfactory weight for each Pxtr, which blocks RecSys's further advancements.
In this paper, we provide Pantheon, a practical neural-network based ensemble sort. Compared with formulation-based ensemble
sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture
ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states
as model input, which could benefit from the Ranking model to enhance our model complexity.
Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an
iterative Pareto policy optimization (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge,
this paper is the first work to replace the entire formulation-based ensemble sort
in industry RecSys, which was fully deployed at Kuaishou live-streaming services,
serving 400 Million users daily.
T-Stars-Poster: A Framework for Product-Centric Advertising Image Design
- Hongyu Chen
- Min Zhou
- Jing Jiang
- Jiale Chen
- Yang Lu
- Zihang Lin
- Bo Xiao
- Tiezheng Ge
- Bo Zheng
Creating advertising images is often a labor-intensive and time-consuming process.
Can we automatically generate such images using basic product information like a product
foreground image, taglines, and a target size? Existing methods mainly focus on parts
of the problem and lack a comprehensive solution. To bridge this gap, we propose a
novel product-centric framework for advertising image design called T-Stars-Poster.
It consists of four sequential stages to highlight product foregrounds and taglines
while achieving overall image aesthetics: prompt generation, layout generation, background
image generation, and graphics rendering. Different expert models are designed and
trained for the first three stages: First, a visual language model (VLM) generates
background prompts that match the products. Next, a VLM-based layout generation model
arranges the placement of product foregrounds, graphic elements (taglines and decorative
underlays), and various nongraphic elements (objects from the background prompt).
Following this, an SDXL-based model can simultaneously accept prompts, layouts, and
foreground controls to generate images. To support T-Stars-Poster, we create two corresponding
datasets with over 50,000 labeled images. Extensive experiments and online A/B tests
demonstrate that T-Stars-Poster can produce more visually appealing advertising images.
Let Topology Speak: Graph Neural Network with Topology-Aware Augmentation
- Kangzhuo Chen
- Xiaoqian Sun
- Huawei Shen
- Xueqi Cheng
Company financial risk is widespread, and accurate prediction is critical to avoiding
significant losses. Many risky companies often exhibit subtle anomalies, incomplete
information, or limited interactions with others; however, the types of their interactions
remains diverse and informative. Existing methods like metapath-based Graph Neural
Networks effectively leverage node relationships but are constrained by manual biases,
noise introduction and high computational complexity. Similarly, Graph Transformers
show strong performance but suffer from prohibitively high complexity. To overcome
these challenges, we propose the Graph Neural Network with Topology-Aware Augmentation
(GTA). GTA adopts a dual augmentation strategy based on topology information, augmenting
both topology and attributes. It first performs unification encoding on single node-type
heterogeneous graphs, integrating heterogeneous topology into node representations.
Expressive topology encoding is performed, followed by dual augmentation based on
the learned topology embeddings. Through this approach, GTA achieves effective risk
prediction. Extensive experiments on the real-world dataset demonstrate GTA's superior
performance compared to state-of-the-art metapath-based and graph transformer-based
methods, effectively handling sparse graphs with a single node-type and multiple edge-types.
Comprehensive ablation study and visual analysis further validate the discriminative
power of topology augmentation in distinguishing risky companies. And our code is
publicly available at https://github.com/ckz123/GTA.
See Beyond a Single View: Multi-Attribution Learning Leads to Better Conversion Rate
Prediction
- Sishuo Chen
- Zhangming Chan
- Xiang-Rong Sheng
- Lei Zhang
- Sheng Chen
- Chenghuan Hou
- Han Zhu
- Jian Xu
- Bo Zheng
Conversion rate (CVR) prediction is a core component of online advertising systems,
where the attribution mechanisms-rules for allocating conversion credit across user touchpoints-fundamentally determine
label generation and model optimization. While many industrial platforms support diverse
attribution mechanisms (e.g., First-Click, Last-Click, Linear, and Data-Driven Multi-Touch
Attribution), conventional approaches restrict model training to labels from a single
production-critical attribution mechanism, discarding complementary signals in alternative
attribution perspectives.
To address this limitation, we propose a novel Multi-Attribution Learning (MAL) framework for CVR prediction that integrates signals from multiple attribution perspectives
to better capture the underlying patterns driving user conversions. Specifically,
MAL is a joint learning framework consisting of two core components: the Attribution
Knowledge Aggregator (AKA) and the Primary Target Predictor (PTP). AKA is implemented as a multi-task learner that integrates knowledge extracted
from diverse attribution labels. PTP, in contrast, focuses on the task of generating
well-calibrated conversion probabilities that align with the system-optimized attribution
metric (e.g., CVR under the Last-Click attribution), ensuring direct compatibility
with industrial deployment requirements. Additionally, we propose CAT, a novel training
strategy that leverages the Cartesian product of all attribution label combinations
to generate enriched supervision signals. This design substantially enhances the performance
of the attribution knowledge aggregator. Empirical evaluations demonstrate the superiority
of MAL over single-attribution learning baselines, achieving +0.51% GAUC improvement on
offline metrics. Online experiments demonstrate that MAL achieved a +2.6% increase in ROI (Return on Investment).
Personalized Tree-Based Progressive Regression Model for Watch-Time Prediction in
Short Video Recommendation
- Xiaokai Chen
- Xiao Lin
- Changcheng Li
- Peng Jiang
In online video platforms, accurate watch time prediction has become a fundamental
and challenging problem in video recommendation. Previous research has revealed that
the accuracy of watch time prediction highly depends on both the transformation of
watch-time labels and the decomposition of the estimation process. TPM (Tree based
Progressive Regression Model) achieves State-of-the-Art performance with a carefully
designed and effective decomposition paradigm. TPM discretizes the watch time into
several ordinal intervals and organizes them into a binary decision tree, where each
node corresponds to a specific interval. At each non-leaf node, a binary classifier
is used to determine the specific interval in which the watch time variable most likely
falls, based on the prediction outcome at its parent node.
The tree structure is central to TPM, as it defines the decomposition of watch time
estimation and how ordinal intervals are discretized. However, TPM uses a predefined
full binary tree, which may be sub-optimal for two reasons. First, full binary trees
imply equal partitioning of the watch time space, which may fail to capture the complexity
of real-world distributions. Second, rather than relying on a fixed global structure,
we advocate for a personalized, data-driven tree that can be learned end-to-end. Thus,
we propose PTPM to enable highly personalized decomposition of watch estimation with
better efficacy and efficiency. Moreover, we show that TPM suffers from selection
bias due to conditional modeling and propose a simple solution. We conduct extensive
experiments on offline datasets and online environments. Offline results show improved
watch time accuracy, and online A/B tests further validate the effectiveness of our
framework. PTPM has been fully deployed in core traffic scenarios and now serves over
400 million users daily.
Neighbor-enhanced Graph Pre-training and Prompt Learning Framework for Fraud Detection
- Ziyang Cheng
- Jie Yang
- Yixin Song
- Dawei Cheng
- Guang Yang
- Bo Wang
Nowadays, as more users turn to WeChat Pay and other e-commerce platforms for transactions,
an increasing number of fraudsters are being attracted to these platforms to conduct
fraudulent activities, thereby stealing money. To address this issue, Graph Neural
Networks (GNNs) have been widely adopted and have shown great success. However, with
the rise of various transaction methods, users are increasingly engaging in multiple
transaction networks, which creates a new scenario that requires models to detect
fraud across these diverse networks. Unfortunately, current GNN-based fraud detection
strategies often exhibit suboptimal performance and high time complexity in this evolving
scenario, as they typically can handle only one transaction network at a time. Recently,
advancements in graph prompt learning have demonstrated great success in managing
various types of graph data and improving the generalization capabilities of the model,
showing great promise for addressing this new fraud detection scenario. Nevertheless,
the practical application of graph prompt learning in real-world fraud detection is
still constrained, as they may exhibit bias when dealing with multiplex transaction
networks and may fail to model the intrinsic relationships between nodes and their
neighbors, which is crucial for effective fraud detection. To address these two challenges,
we propose GPCF, an efficient graph pre-training and prompt learning framework. GPCF
first incorporates a meta-learning-based strategy within neighbor-enhanced contrastive
learning to pre-train the GNN model across diverse transaction networks. Then it aligns
fraud detection tasks with the well-pre-trained model by simply fine-tuning the prompts.
Extensive experiments demonstrate that GPCF achieves state-of-the-art results on open-access
fraud and transaction datasets, as well as on real-world fraud datasets from WeChat
Pay, one of the largest e-commerce platforms globally, showing the effectiveness of
GPCF in practical applications.
Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary
Documents
- Nayoung Choi
- Grace Byun
- Andrew Chung
- Ellie S. Paek
- Shinsun Lee
- Jinho D. Choi
Proprietary corporate documents contain rich domain-specific knowledge, but their
overwhelming volume and disorganized structure make it difficult even for employees
to access the right information when needed. For example, in the automotive industry,
vehicle crash-collision tests-each costing hundreds of thousands of dollars-produce
highly detailed documentation. However, retrieving relevant content during decision-making
remains time-consuming due to the scale and complexity of the material. While Retrieval-Augmented
Generation (RAG)-based Question Answering (QA) systems offer a promising solution,
building an internal RAG-QA system poses several challenges: (1) handling heterogeneous
multi-modal data sources, (2) preserving data confidentiality, and (3) enabling traceability
between each piece of information in the generated answer and its original source
document. To address these, we propose a RAG-QA framework for internal enterprise
use, consisting of: (1) a data pipeline that converts raw multi-modal documents into
a structured corpus and QA pairs, (2) a fully on-premise, privacy-preserving architecture,
and (3) a lightweight reference matcher that links answer segments to supporting content.
Applied to the automotive domain, our system improves factual correctness (+1.79,
+1.94), informativeness (+1.33, +1.16), and helpfulness (+1.08, +1.67) over a non-RAG
baseline, based on 1-5 scale ratings from both human and LLM judge. The system was
deployed internally for pilot testing and received positive feedback from employees.
When Words Can't Capture It All: Towards Video-Based User Complaint Text Generation
with Multimodal Video Complaint Dataset
- Sarmistha Das
- R E Zera Marveen Lyngkhoi
- Kirtan Jain
- Vinayak Goyal
- Sriparna Saha
- Manish Gupta
While there exists a lot of work on explainable complaint mining, articulating user
concerns through text or video remains a significant challenge, often leaving issues
unresolved. Users frequently struggle to express their complaints clearly in text
but can easily upload videos depicting product defects (e.g., vague text such as 'worst
product' paired with a 5-second video depicting a broken headphone with the right
earcup). This paper formulates a new task in the field of complaint mining to aid
the common users' need to write an expressive complaint, which is Complaint Description
from Videos (CoD-V) (e.g., to help the above user articulate her complaint about the
defective right earcup). To this end, we introduce ComVID, a video complaint dataset
containing 1,175 complaint videos and the corresponding descriptions, also annotated
with the emotional state of the complainer. Additionally, we present a new complaint
retention (CR) evaluation metric that discriminates proposed (CoD-V) task against
standard video summary generation and description task. To strengthen this initiative,
we introduce a multimodal Retrieval-Augmented Generation (RAG) embedded VideoLLaMA2-7b
model, designed to generate complaints while accounting for the user's emotional state.
We conduct a comprehensive evaluation of several Video Language Models on several
tasks (pre-trained and fine-tuned versions) with a range of established evaluation
metrics, including METEOR, perplexity, and the Coleman-Liau readability score, among
others. Our study lays the foundation for a new research direction to provide a platform
for users to express complaints through video. Dataset and resources are available
at: https://github.com/sarmistha-D/CoD-V.
RottenReviews: Benchmarking Review Quality with Human and LLM-Based Judgments
- Sajad Ebrahimi
- Soroush Sadeghian
- Ali Ghorbanpour
- Negar Arabzadeh
- Sara Salamat
- Muhan Li
- Hai Son Le
- Mahdi Bashari
- Ebrahim Bagheri
The quality of peer review plays a critical role in scientific publishing, yet remains
poorly understood and challenging to evaluate at scale. In this work, we introduce
RottenReviews, a benchmark designed to facilitate systematic assessment of review
quality. RottenReviews comprises over 15,000 submissions from four distinct academic
venues enriched with over 9,000 reviewer scholarly profiles and paper metadata. We
define and compute a diverse set of quantifiable review-dependent and reviewer-dependent
metrics, and compare them against structured assessments from large language models
(LLMs) and expert human annotations. Our human-annotated subset includes over 700
paper-review pairs labeled across 13 explainable and conceptual dimensions of review
quality. Our empirical findings reveal that LLMs, both zero-shot and fine-tuned, exhibit
limited alignment with human expert evaluations of peer review quality. Surprisingly,
simple interpretable models trained on quantifiable features outperform fine-tuned
LLMs in predicting overall review quality. We publicly release all data, code, and
models at https://github.com/Reviewerly-Inc/RottenReviews to support further research
in this area.
Building a Virtual Member of a Community of Practice
- Joshua Eckroth
- Dayne Freitag
- Jonathan Keefe
- Timothy Meyer
- Karen L. Myers
- Eric Schoen
- Pedro Sequeira
- Reid G. Smith
We describe a virtual member of a knowledge management Community of Practice (CoP),
called ATHENA, that knows an individual, his tasks, his organization, and the community.
ATHENA employs an agentic chat capability that combines embeddings with knowledge-based
faceted search to provide accurate responses to technical questions along with rationale
and citations for efficient validation. ATHENA supports natural, in-the-flow capture
of task-related insights to share within a CoP, along with proactive dissemination
of information tied to an individual and his current needs. An evaluation involving
75 professionals from the Oil & Gas sector shows that ATHENA dramatically improved
outcomes and productivity on a set of well-planning tasks compared to their use of
a state-of-the-art RAG baseline. Interestingly, ATHENA also enabled eight non-experts
to perform at expert levels.
NeighSqueeze: Compact Neighborhood Grouping for Efficient Billion-Scale Heterogeneous
Graph Learning
- Xinyue Feng
- Shuxin Zhong
- Jinquan Hang
- Yuequn Zhang
- Guang Yang
- Haotian Wang
- Desheng Zhang
- Guang Wang
The rapid growth of online shopping has intensified competition among logistics companies,
highlighting the importance of customer expansion, i.e., identifying customers willing
to establish long-term contracts. Although existing approaches frame customer expansion
as a node classification task using heterogeneous graph learning to capture complex
interactions between a customer and other items, it is computationally infeasible
to utilize all neighboring interactions on large-scale logistics graphs. Current sub-sampling
methods reduce computational load by sampling a small part of neighborhood for training.
However, they introduce substantial information loss, particularly affecting high-degree
nodes and decreasing predictive accuracy. To address this, we introduce NeighSqueeze, a novel approach that groups structurally and semantically similar nodes, substantially
reducing the neighbors count and facilitating full-neighbor learning. NeighSqueeze
consists of three modules designed to efficiently and effectively enable node grouping
on billion-scale heterogeneous graphs: (1) Structure-tightness-based neighbor filtering
reduces the high redundancy and complexity in similarity computations. (2) Hybrid
similarity graph construction addresses the difficulty of measuring node similarity
at scale; and (3) A two-level grouping strategy resolves the label dominance issue
within groups. We evaluate NeighSqueeze on JD Logistics, one of the largest logistics
companies in China. Compared with sub-sampling methods, our NeighSqueeze exhibits
lower runtime and memory usage with full-neighbor training on the compressed graph,
while simultaneously improving average precision over 28.9% in offline evaluation
and increase new customer exploration rate by 18.6% in online A/B testing.
Converted Data is All You Need for Causal Optimization of e-Commerce Promotions
- Dmitri Goldenberg
- Hugo Manuel Proença
- Amit Livne
- Felipe Moraes
- Javier Albert
- Bracha Shapira
Promotional campaigns are essential drivers of customer engagement and revenue in
e-commerce. Maintaining these campaigns within budget constraints requires targeted
allocation, traditionally achieved through causal uplift models that rely on vast
datasets of user interactions, including non-converted sessions, which introduce challenges
such as noisy data, attribution complexity and imbalanced outcomes. We propose a novel
approach using converted-only data, which reduces training data size, simplifies attribution,
improves efficiency, and mitigates the impact of non-converted interactions. We present
a generalized framework for budget constrained promotion allocation with converted-only
data and validate it through a benchmarking study and multiple large-scale deployments
at Booking.com, positively impacting the experience of millions of customers worldwide.
Our results demonstrate that the proposed method is competitive with standard modeling
approaches and, in some cases, significantly outperforms them.
SSH-T3 : A Hierarchical Pre-training Framework for Multi-Scenario Financial Risk Assessment
- Zehao Gu
- Yateng Tang
- Jiarong Xu
- Zhang Siwei
- Xuehao Zheng
- Xi Chen
- Yun Xiong
Efficiently modeling user behavior on online payment platforms is crucial for accurately
identifying potential financial risks. With the rapid growth of online payment platforms,
the volume of user transaction data has significantly increased. Moreover, users'
payment behaviors often encompass diverse activities and interactions across multiple
scenarios. Based on observations from online payment platforms, we identify three
key challenges: scarce labels and poor representation robustness, long user payment
behavior sequences, and complex and heterogeneous amount-aware scenarios.
To address these challenges, we propose a novel Self-Supervised Hierarchical Two-Tower
Transformer SSH-T3, specifically designed for multi-scenario financial risk assessments. We introduce
a masked modeling pre-training task to reconstruct multi-scenario day-level transaction
amount distributions, effectively mitigating behavior-level noise and enhancing representation
robustness. Additionally, we propose a hierarchical Multi-Scenario Payment Behavior
Sequence (MS-PBS) modeling approach tailored to business needs, which significantly
reduces complexity while capturing user behavior patterns more effectively through
day-level representations. Furthermore, we highlight the critical importance of correlating
multi-scenario data in MS-PBS modeling to better identify defaulter patterns. To this
end, we design a Two-Tower Transformer equipped with a specialized attention mechanism
that captures intricate user patterns across scenarios. Extensive experiments conducted
on both offline and online real-world business datasets demonstrate the effectiveness
and applicability of SSH-T3.
MISS: Multi-Modal Tree Indexing and Searching with Lifelong Sequential Behavior for
Retrieval Recommendation
- Chengcheng Guo
- Junda She
- Kuo Cai
- Shiyao Wang
- Qigen Hu
- Qiang Luo
- Guorui Zhou
- Kun Gai
Large-scale industrial recommendation systems typically employ a two-stage paradigm
of retrieval and ranking to handle huge amounts of information. Recent research focuses
on improving the performance of retrieval model. A promising way is to introduce extensive
information about users and items. On one hand, lifelong sequential behavior is valuable.
Existing lifelong behavior modeling methods in ranking stage focus on the interaction
of lifelong behavior and candidate items from retrieval stage. In retrieval stage,
it is difficult to utilize lifelong behavior because of a large corpus of candidate
items. On the other hand, existing retrieval methods mostly relay on interaction information,
potentially disregarding valuable multi-modal information. To solve these problems,
we represent the pioneering exploration of leveraging multi-modal information and
lifelong sequence model within the advanced tree-based retrieval model. We propose
Multi-modal Indexing and Searching with lifelong Sequence (MISS), which contains a
multi-modal index tree and a multi-modal lifelong sequence modeling module. Specifically,
for better index structure, we propose multi-modal index tree, which is built using
the multi-modal embedding to precisely represent item similarity. To precisely capture
diverse user interests in user lifelong sequence, we propose collaborative general
search unit (Co-GSU) and multi-modal general search unit (MM-GSU) for multi-perspective
interests searching.
BiListing: Modality Alignment for Listings
- Guillaume Guy
- Mihajlo Grbovic
- Chun How Tan
- Han Zhao
Airbnb is a leader in offering travel accommodations. Airbnb has historically relied
on structured data to understand, rank, and recommend listings to guests due to the
limited capabilities and associated complexity arising from extracting meaningful
information from text and images. With the rise of representation learning, leveraging
rich information from text and photos has become easier. A popular approach has been
to create embeddings for text documents and images to enable use cases of computing
similarities between listings or using embeddings as features in an ML model.
However, an Airbnb listing has diverse unstructured data: multiple images, various
unstructured text documents such as title, description, and reviews, making this approach
challenging. Specifically, it is a non-trivial task to combine multiple embeddings
of different pieces of information, i.e. each image, each review, etc., to reach a
single meaningful listing representation, especially if some of the embeddings lie
in different spaces. Faced with such a problem, practitioners often resort to unprincipled
approaches of averaging embeddings to produce a single one. However, this often results
in an inaccurate representation due to loss of information in the averaging process.
This paper proposes BiListing, for Bimodal Listing, an approach to align text and
photos of a listing by leveraging large-language models and pretrained language-image
models. The BiListing approach has several favorable characteristics: capturing unstructured
data into a single embedding vector per listing and modality, enabling zero-shot capability
to search inventory efficiently in user-friendly semantics, overcoming the cold start
problem, and enabling listing-to-listing search along a single modality, or both.
We conducted offline and online tests to leverage the BiListing embeddings in the
Airbnb search ranking model, and successfully deployed it in production, achieved
0.425% of NDCB gain, and drove tens of millions in incremental revenue.
Out of Distribution Detection for Efficient Continual Learning in Quality Prediction
for Arc Welding
- Yannik Hahn
- Jan Voets
- Antonin Königsfeld
- Hasan Tercan
- Tobias Meisen
Modern manufacturing relies heavily on fusion welding processes, including gas metal
arc welding (GMAW). Despite significant advances in machine learning-based quality
prediction, current models exhibit critical limitations when confronted with the inherent
distribution shifts that occur in dynamic manufacturing environments. In this work,
we extend the VQ-VAE Transformer architecture-previously demonstrating state-of-the-art
performance in weld quality prediction-by leveraging its autoregressive loss as a
reliable out-of-distribution (OOD) detection mechanism. Our approach exhibits superior
performance compared to conventional reconstruction methods, embedding error-based
techniques, and other established baselines. By integrating OOD detection with continual
learning strategies, we optimize model adaptation, triggering updates only when necessary
and thereby minimizing costly labeling requirements. We introduce a novel quantitative
metric that simultaneously evaluates OOD detection capability while interpreting in-distribution
performance. Experimental validation in real-world welding scenarios demonstrates
that our framework effectively maintains robust quality prediction capabilities across
significant distribution shifts, addressing critical challenges in dynamic manufacturing
environments where process parameters frequently change. This research makes a substantial
contribution to applied artificial intelligence by providing an explainable and at
the same time adaptive solution for quality assurance in dynamic manufacturing processes-a
crucial step towards robust, practical AI systems in the industrial environment.
Beyond Pairwise Learning-To-Rank At Airbnb
- Malay Haldar
- Daochen Zha
- Huiji Gao
- Liwei He
- Sanjeev Katariya
There are three fundamental asks from a ranking algorithm: it should scale to handle a large number of items, sort items accurately by their utility, and impose a total order on the items for logical consistency. But here's the catch---no algorithm can achieve
all three at the same time. We call this limitation the SAT theorem for ranking algorithms.
Given the dilemma, how can we design a practical system that meets user needs? Our
current work at Airbnb provides an answer, with a working solution deployed at scale.
We start with pairwise learning-to-rank (LTR) models---the bedrock of search ranking
tech stacks today. They scale linearly with the number of items ranked and perform
strongly on metrics like NDCG by learning from pairwise comparisons. They are at a
sweet spot of performance vs. cost, making them an ideal choice for several industrial
applications. However, they have a drawback---by ignoring interactions between items,
they compromise on accuracy.
To improve accuracy, we create a ''true'' pairwise LTR model---one that captures interactions
between items during pairwise comparisons. But accuracy comes at the expense of scalability
and total order, and we discuss strategies to counter these challenges.
Traveling further along the road to greater accuracy, we take each item in the search
result, and compare it against the rest of the items along two dimensions: (1) Superiority: How strongly do searchers prefer the given item over the remaining ones? (2) Similarity : How similar is the given item to all the other items? This forms the basis of our
''all-pairwise'' LTR framework, which factors in interactions across all items at
once. Looking at items on the search result page all together---superiority and similarity
combined---gives us a deeper understanding of what searchers truly want. We quantify
the resulting improvements in searcher experience through offline and online experiments
at Airbnb.
Augmenting Limited and Biased RCTs through Pseudo-Sample Matching-Based Observational
Data Fusion Method
- Kairong Han
- Weidong Huang
- Taiyang Zhou
- Peng Zhen
- Kun Kuang
In the online ride-hailing pricing context, companies often conduct randomized controlled
trials (RCTs) and utilize uplift models to assess the effect of discounts on customer
orders, which substantially influences competitive market outcomes. However, due to
the high cost of RCTs, the proportion of trial data relative to observational data
is small, which only accounts for 0.65% of total traffic in our context, resulting
in significant bias when generalizing to the broader user base. Additionally, the
complexity of industrial processes reduces the quality of RCT data, which is often
subject to heterogeneity from potential interference and selection bias, making it
difficult to correct. Moreover, existing data fusion methods are challenging to implement
effectively in complex industrial settings due to the high dimensionality of features
and the strict assumptions that are hard to verify with real-world data. To address
these issues, we propose an empirical data fusion method called pseudo-sample matching.
By generating pseudo-samples from biased, low-quality RCT data and matching them with
the most similar samples from large-scale observational data, the method expands the
RCT dataset while mitigating its heterogeneity. We validated the method through simulation
experiments, conducted offline and online tests using real-world data. In a week-long
online experiment, we achieved a 0.41% improvement in profit, which is a considerable
gain when scaled to industrial scenarios with hundreds of millions in revenue. In
addition, we discuss the harm to model training, offline evaluation, and online economic
benefits when the RCT data quality is not high, and emphasize the importance of improving
RCT data quality in industrial scenarios. Further details of the simulation experiments
can be found in the GitHub repository https://github.com/Kairong-Han/Pseudo-Matching.
Development of Autonomous Failure Maintenance System for Semiconductor Manufacturing
- Nuri Han
- Jiwon Seo
- Jonghee Ha
- Jihyung Oh
- Jinwoo Lee
- Boram Jeong
- Jongbin Park
- Gilhwan Kim
- Yohwan Joo
Semiconductor equipment failure analysis is critical in the fast-paced semiconductor
industry, where complex and sensitive components are susceptible to failures that
can lower productivity, raise costs, and shorten equipment lifespan. But conventional
failure analysis methods rely on time series-based fault detection classification
to identify the cause of failure, and have limitations in relying on experts for appropriate
actions. We present a fully deployed autonomous maintenance system that closes the
failure recovery loop, from detecting anomalies to executing corrective actions, by
integrating graph-based log analysis, LLM-based semantic reasoning and hybrid retrieval
mechanisms. Deployed across 762 photolithography equipment at Samsung Electronics
fab from 2021 to 2025, the system handled over 85,000 real-world breakdowns, reduced
Mean Time to Recovery by 7 minutes on average. This autonomous system enhances productivity,
reduces costs, and maintains high product quality, supporting the transition to fully
automated environments in the semiconductor industry.
MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
- Ruidong Han
- Bin Yin
- Shangyu Chen
- He Jiang
- Fei Jiang
- Xiang Li
- Chi Ma
- Mincong Huang
- Xiaoguang Li
- Chunzhen Jing
- Yueming Han
- MengLei Zhou
- Lei Yu
- Chuan Liu
- Wei Lin
Scaling law has recently been validated in the recommendation system, adopting generative
recommendation strategies to achieve scalability. However, these generative approaches
require abandoning the meticulously constructed cross features of traditional recommendation
models,leading to a significant decline in model performance. To address this challenge,
we propose Meituan Generative Recommendation, which is based on the HSTU architecture
and is capable of retaining the original deep learning recommendation model (DLRM)
features, including cross features. Additionally, MTGR achieves training and inference
acceleration through user-level compression to ensure efficient scaling. We also propose
Group-Layer Normalization (GLN) to enhance the performance of encoding within different
semantic spaces and the dynamic masking strategy to avoid information leakage. We
further optimize the training frameworks, enabling support for our models with 10
to 100 times computational complexity compared to the DLRM, without significant cost
increases. MTGR achieved 65x FLOPs for single-sample forward inference compared to
the DLRM model, resulting in the largest gain in nearly two years both offline and
online. This breakthrough was successfully deployed on Meituan, the world's largest
food delivery platform, where it has been handling the main traffic.
Cross-Domain Graph Neural Networks for Notification at LinkedIn
- Shihai He
- Julie Choi
- Tianqi Li
- Zhiwei Ding
- Peng Du
- Priya Bannur
- Franco Liang
- Fedor Borisyuk
- Padmini Jaikumar
- Xiaobing Xue
- Viral Gupta
Notification recommendation systems are critical to driving user engagement on professional
platforms like LinkedIn. Designing such systems involves integrating heterogeneous
signals across domains, capturing temporal dynamics, and optimizing for multiple,
often competing, objectives. Graph Neural Networks (GNNs) provide a powerful framework
for modeling complex interactions in such environments.
In this paper, we present a cross-domain GNN-based system deployed at LinkedIn that
unifies user, content, and activity signals into a single, large-scale graph. By training
on this cross-domain structure, our model significantly outperforms single-domain
baselines on key tasks, including click-through rate (CTR) prediction and professional
engagement. We introduce architectural innovations including temporal modeling and
multi-task learning, which further enhance performance.
Deployed in LinkedIn's notification system, our approach led to a 0.10% lift in weekly
active users and a 0.62% improvement in CTR. We detail our graph construction process,
model design, training pipeline, and both offline and online evaluations. Our work
demonstrates the scalability and effectiveness of cross-domain GNNs in real-world,
high-impact applications.
Heterogeneous Influence Maximization in User Recommendation
- Hongru Hou
- Jiachen Sun
- Wenqing Lin
- Wendong Bi
- Xiangrong Wang
- Deqing Yang
User recommendation systems enhance user engagement by encouraging users to act as
inviters to interact with other users (invitees), potentially fostering information
propagation. Conventional recommendation methods typically focus on modeling interaction
willingness. Influence-Maximization (IM) methods focus on identifying a set of users
to maximize the information propagation. However, existing methods face two significant
challenges. First, recommendation methods fail to unleash the candidates' spread capability.
Second, IM methods fail to account for the willingness to interact. To solve these
issues, we propose two models named HeteroIR and HeteroIM. HeteroIR provides an intuitive
solution to unleash the dissemination potential of user recommendation systems. HeteroIM
fills the gap between the IM method and the recommendation task, improving interaction
willingness and maximizing spread coverage. The HeteroIR introduces a two-stage framework
to estimate the spread profits. The HeteroIM incrementally selects the most influential
invitee to recommend and rerank based on the number of reverse reachable (RR) sets
containing inviters and invitees. RR set denotes a set of nodes that can reach a target
via propagation. Extensive experiments show that HeteroIR and HeteroIM significantly
outperform the state-of-the-art baselines with the p-value<0.05. Furthermore, we have
deployed HeteroIR and HeteroIM in Tencent's online gaming platforms and gained an
8.5% and 10% improvement in the online A/B test, respectively. Implementation codes
are available at https://github.com/socialalgo/HIM.
An LLM-based Behavior Modeling Framework for Malicious User Detection
- Meng Jiang
- Wenjie Wang
- Chongming Gao
- Shaofeng Hu
- Kaishen Ou
- Hui Lin
- Fuli Feng
Malicious users pose significant threats to social platforms. Extensive efforts have
leveraged user behavior sequences to model relationships between various actions and
capture behavioral patterns for malicious user detection; however, they rely on behavior
IDs, ignoring valuable behavior content such as self-introductions in friend requests,
which offer crucial clues for detecting malicious user. We thus propose leveraging
Large Language Models (LLMs) to jointly model IDs and content in user behavior sequences.
The key to effective malicious user detection is to infer malicious user behavior
patterns. However, inferring these patterns from labeled behavior sequences suffers
from poor data efficiency and limited generalization, resulting in suboptimal malicious
user detection performance.
To overcome the limitations, we propose leveraging the malicious user specifications
(e.g., definitions and common deceptive tactics) from the existing expert handbook. These
specifications guide LLMs in reasoning over user behavior IDs and content before making
predictions. To this end, we introduce an LLM-based behavior modeling framework with
an expert handbook to enhance LLMs' behavior reasoning. We first distill the user's
behaviors into a concise summary, guided by malicious user specifications in the expert
handbook, and then feed the summary and users' demographic features into LLMs for
comprehensive reasoning and detection. We conduct extensive online and offline experiments
on the Weixin platform, validating the superiority of the proposed framework over
the original Weixin detection baseline, achieving, for example, a 5.34% improvement
in F1-Score.
GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning
- Dan Kalifa
- Uriel Singer
- Kira Radinsky
Proteins are central to biological processes and indispensable for living organisms.
Accurate representation of proteins is crucial, especially in drug development. Recent
advances have applied machine learning for unsupervised protein representation learning.
However, these approaches often focus solely on the amino acid sequence of proteins
and lack factual knowledge about proteins and their interactions, thus limiting their
performance. In this study, we present GOProteinGNN, a novel architecture that enhances
protein language models by integrating protein knowledge graph information during
the creation of amino acid level representations. Our approach allows for the integration
of information at both the individual amino acid level and the entire protein level,
enabling a comprehensive and effective learning process through graph-based learning.
By doing so, we can capture complex relationships and dependencies between proteins
and their functional annotations, resulting in more robust and contextually enriched
protein representations. Unlike previous methods, GOProteinGNN uniquely learns the
entire protein knowledge graph during training, which allows it to capture broader
relational nuances and dependencies beyond mere triplets as done in previous work.
We perform a comprehensive evaluation on several downstream tasks, demonstrating that
GOProteinGNN consistently outperforms previous methods, showcasing its effectiveness
and establishing it as a state-of-the-art solution for protein representation learning.
We discuss the practical integration of GOProteinGNN in a laboratory setting for lipid
nanoparticle-based drug delivery, aiming to bypass the blood-brain barrier and discover
novel components, with positive results observed in mice.
Smart ECU: Scalable On-Vehicle Deployment of Drivetrain Fault Classification Systems
for Commercial Electric Vehicles
- Jaeho Kim
- Kwangryeol Park
- Kyu Hwan Lee
- Jeongmin Oh
- Dongjin Park
- Hyunseok Oh
- Youngrock Chung
- Kyung-Woo Lee
- Dae-Un Sung
- Seulki Lee
We present Smart ECU, the first on-vehicle drivetrain fault classification solution
for motor-reducers on commercial electric vehicles (EVs), designed to be scalable
in mass production. To develop and validate this system, we collect real-world vibration
data from seven different EV models (e.g. Hyundai IONIQ 5, KIA EV6) and over 19 drivetrains under diverse driving conditions.
This work addresses key challenges in deploying motor-reducer fault classification
functionality onto an extremely resource-constrained ECU environment, facilitating
on-vehicle deployment of PHM solutions on commercially manufactured EVs. Specifically,
we tackle the following challenges: (1) real-vehicle data collection, (2) development
under tight ECU resource constraints, (3) class imbalance between normal and fault
conditions, and (4) scalability of the fault classification system across different
EV models with limited fault data availability. We deploy and evaluate Smart ECU on
both intra-car and inter-car scenarios, showing strong generalization performance
in both setups. The proposed method enables rapid development of fault classification
for new vehicle designs without requiring fault data from customer usage, significantly
shortening the deployment timeline. Our solution addresses both technical and industrial
challenges in deploying ECU-based smart diagnostics for commercial EVs, while also
demonstrating broader applicability beyond drivetrain systems to other critical vehicle
components. To the best of our knowledge, this is the first work to (1) collect real-world
motor-reducer fault data, (2) implement a lightweight fault classification algorithm
on ECUs, and (3) demonstrate its scalability across various EV types.
DeepAries: Adaptive Rebalancing Interval Selection for Enhanced Portfolio Selection
- Jinkyu Kim
- Hyungjung Yi
- Mogan Gim
- Donghee Choi
- Jaewoo Kang
We propose DeepAries, a novel deep reinforcement learning framework for dynamic portfolio management that
jointly optimizes the timing and allocation of rebalancing decisions. Unlike prior
reinforcement learning methods that employ fixed rebalancing intervals regardless
of market conditions, DeepAries adaptively selects optimal rebalancing intervals along with portfolio weights to
reduce unnecessary transaction costs and maximize risk-adjusted returns. Our framework
integrates a Transformer-based state encoder, which effectively captures complex long-term
market dependencies, with Proximal Policy Optimization (PPO) to generate simultaneous
discrete (rebalancing intervals) and continuous (asset allocations) actions. Extensive
experiments on multiple real-world financial markets demonstrate that DeepAries significantly outperforms traditional fixed-frequency and full-rebalancing strategies
in terms of risk-adjusted returns, transaction costs, and drawdowns. Additionally,
we provide a live demo of DeepAries at https://deep-aries.github.io/, along with the source code and dataset at https://github.com/dmis-lab/DeepAries,
illustrating DeepAries' capability to produce interpretable rebalancing and allocation
decisions aligned with shifting market regimes. Overall, DeepAries introduces an innovative paradigm for adaptive and practical portfolio management
by integrating both timing and allocation into a unified decision-making process.
Exploring Database Normalization Effects on SQL Generation
Schema design, particularly normalization, is a critical yet often overlooked factor
in natural language to SQL (NL2SQL) systems. Most prior research evaluates models
on fixed schemas, overlooking the influence of design on performance. We present the
first systematic study of schema normalization's impact, evaluating eight leading
large language models on synthetic and real-world datasets with varied normalization
levels. We construct controlled synthetic datasets with formal normalization (1NF-3NF)
and real academic paper datasets with practical schemes. Our results show that denormalized
schemas offer high accuracy on simple retrieval queries, even with cost-effective
models in zero-shot settings. In contrast, normalized schemas (2NF/3NF) introduce
challenges such as errors in base table selection and join type prediction; however,
these issues are substantially mitigated by providing few-shot examples. For aggregation
queries, normalized schemas yielded better performance, mainly due to their robustness
against the data duplication and NULL value issues that cause errors in denormalized
schemas. These findings suggest that the optimal schema design for NL2SQL applications
depends on the types of queries to be supported. Our study demonstrates the importance
of considering schema design when developing NL2SQL interfaces and integrating adaptive
schema selection for real-world scenarios.
THEME: Enhancing Thematic Investing with Semantic Stock Representations and Temporal
Dynamics
- Hoyoung Lee
- Wonbin Ahn
- Suhwan Park
- Jaehoon Lee
- Minjae Kim
- Sungdong Yoo
- Taeyoon Lim
- Woohyung Lim
- Yongjae Lee
Thematic investing, which aims to construct portfolios aligned with structural trends,
remains a challenging endeavor due to overlapping sector boundaries and evolving market
dynamics. A promising direction is to build semantic representations of investment
themes from textual data. However, despite their power, general-purpose LLM embedding
models are not well-suited to capture the nuanced characteristics of financial assets,
since the semantic representation of investment assets may differ fundamentally from
that of general financial text. To address this, we introduce THEME, a framework that
fine-tunes embeddings using hierarchical contrastive learning. THEME aligns themes
and their constituent stocks using their hierarchical relationship, and subsequently
refines these embeddings by incorporating stock returns. This process yields representations
effective for retrieving thematically aligned assets with strong return potential.
Empirical results demonstrate that THEME excels in two key areas. For thematic asset
retrieval, it significantly outperforms leading large language models. Furthermore,
its constructed portfolios demonstrate compelling performance. By jointly modeling
thematic relationships from text and market dynamics from returns, THEME generates
stock embeddings specifically tailored for a wide range of practical investment applications.
Waypoint POI Recommendation for Vehicle Navigation Services using Hierarchical Graphs
and Contrastive Learning
- Jongsoo Lee
- Heejun Shin
- Namhyuk Kim
- Dong-Kyu Chae
Modern vehicle navigation systems can greatly benefit from waypoint point-of-interest (POI) recommendation, which suggests personalized intermediate stops along a driving route. This paper
defines the novel waypoint POI recommendation problem: given a starting point and
a destination, recommend one or more personalized POIs to visit en route. This scenario
(e.g., suggesting a lunch stop during a road trip) differs from the conventional ''next
POI'' recommendation in that it infers waypoint POIs from only two (origin and destination)
inputs and predicts multiple intermediate stops rather than a single next location.
To solve this problem, we propose WayPOI, a novel recommender model for Waypoint POI suggestion based on hierarchical graph based contrastive learning (WayPOI). WayPOI constructs a hierarchical graph that captures both individual and group-level
behavioral patterns of users and POIs, and it employs a contrastive learning strategy
to learn effective user and POI representations from sparse data. Through experiments
on real-world driving data provided by Hyundai as well as on three public datasets,
we demonstrate that WayPOI significantly outperforms several recent POI recommendation
models, even though these baselines were carefully re-formed and retrained to perform
waypoint recommendation for a fair comparison. Our ablation study confirms the benefit
of each proposed component.
Anomaly Detection for Advanced Driver Assistance System with NCDE-based Normalizing
Flow
- Kangjun Lee
- Minha Kim
- Youngho Jun
- Simon S. Woo
For electric vehicles, the Adaptive Cruise Control (ACC) in Advanced Driver Assistance
Systems (ADAS) is designed to assist braking based on driving conditions and user
patterns. However, the driving data collected during development are limited and lack
diversity, leading to late or aggressive braking. Moreover, it is necessary to effectively
identify anomalies in braking patterns, which is critical for self-driving autonomous
vehicles. We propose Graph Neural Controlled Differential Equation Normalizing Flow (GDFlow), which leverages Normalizing Flow (NF) with Neural Controlled Differential Equations
(NCDE) to learn the distribution of normal driving patterns. Our approach captures
spatio-temporal information from sensor data and accurately models continuous changes
in driving patterns. Additionally, we introduce a quantile-based maximum likelihood
objective to improve the likelihood estimate of normal data at the margin of the distribution.
We validate GDFlow using real-world electric vehicle driving data that we collected
from Hyundai IONIQ5 and GV80EV. Our model achieves state-of-the-art (SOTA) performance
compared to nine baselines across four dataset configurations of different vehicle
types and drivers. Furthermore, our model outperforms the latest anomaly detection
methods across four time series benchmark datasets. Our approach demonstrates superior
efficiency in inference time compared to existing methods.
OASIS: Harnessing Diffusion Adversarial Network for Ocean Salinity Imputation using
Sparse Drifter Trajectories
- Bo Li
- Yingqi Feng
- Ming Jin
- Xin Zheng
- Yufei Tang
- Laurent Cherubin
- Can Wang
- Alan Wee-Chung Liew
- Qinghua Lu
- Jingwei Yao
- Hong Zhang
- Shirui Pan
- Xingquan Zhu
Ocean salinity plays a vital role in circulation, climate, and marine ecosystems,
yet its measurement is often sparse, irregular, and noisy, especially in drifter-based
datasets. Traditional approaches, such as remote sensing and optimal interpolation,
rely on linearity and stationarity, and are limited by cloud cover, sensor drift,
and low satellite revisit rates. While machine learning models offer flexibility,
they often fail under severe sparsity and lack principled ways to incorporate physical
covariates without specialized sensors. In this paper, we introduce the OceAn Salinity
Imputation System, a novel diffusion adversarial framework designed to address these
challenges by: (1) employing a transformer-based global dependency capturing module
to learn long-range spatio-temporal correlations from sparse trajectories; (2) constructing
a generative imputation model that conditions on easily observed tidal covariates
to progressively refine imputed salinity fields; and (3) using a scheduler diffusion
method to enhance the model's robustness. This unified architecture exploits the periodic
nature of tidal signals as a proxy for unmeasured physical drivers, without the need
for additional equipment. We evaluate OASIS on four benchmark datasets, including
one real-world measurement from Fort Pierce Inlet and three simulated Gulf of Mexico
trajectories. Results show consistent improvements over both traditional and neural
baselines, achieving up to 52.5% reduction in MAE compared to Kriging. We also develop
a lightweight, web-based deployment system that enables salinity imputation through
interactive and batch interfaces, available at: https://github.com/yfeng77/OASIS.
SCAlign: Transaction Event Prediction via Multi-Scale Market Dynamics Alignment
- Boyang Li
- Lingzheng Zhang
- Fugee Tsung
- Xi Zhang
Event prediction plays a pivotal role in analyzing consumer behavior for inventory
and pricing optimization. In dynamic financial markets, customer behavior is often
influenced by price commitment policies, where the historical and pre-announced future
transaction price dynamics can lead to complex behavior patterns, such as advance
consumption or delayed purchasing. Therefore, these phenomena pose significant challenges
to traditional event modeling approaches that rely solely on consumer behaviors. To
address this problem, we propose SCAlign, a cross-domain and multi-scale framework
for market dynamics alignment, designed for event prediction. Our model integrates
both heterogeneous historical and limited observable future commitment prices at different
scales, aligning customer behavior with market fluctuations across multiple time scales.
Finally, through a Mixture-of-Experts (MoE) framework, the model dynamically fuses
these aligned features, enabling adaptive selection of the relevant and appropriate
representations for prediction tasks. Empirical evaluations across diverse transaction
environments demonstrate that our model outperforms state-of-the-art prediction baselines.
Furthermore, it achieves optimal performance across varying data scales, showcasing
its robustness and generalizability.
Taming Ultra-Long Behavior Sequence in Session-wise Generative Recommendation
- Wuchao Li
- Shiyao Wang
- Kuo Cai
- Jiaxin Deng
- Xingmei Wang
- Qigen Hu
- Defu Lian
- Guorui Zhou
Generative recommendation has emerged as a transformative paradigm in recommender
systems, enabling modeling user behavior autoregressively without explicit target
conditioning. While this approach eliminates the need for target signals, it necessitates
compressing extensive historical interactions-potentially spanning lifelong sequences-into
coherent interest representations. Conventional methods for handling long sequences
typically rely on target-guided search mechanisms (e.g., SIM) to efficiently filter
and compress behaviors. However, this strategy is incompatible with generative frameworks
due to their target-agnostic nature. To address these challenges, we propose a novel
encoder-decoder model named HiCoGen (Hierarchical Compression-based Session-wise Generative Model), which efficiently models long-term interests in generative models.
In the encoder, HiCoGen compresses behavior sequences using hierarchical content similarity
clustering and employs a hierarchical attention architecture to reduce sequence length
while preserving information integrity. In the decoder, HiCoGen uses session-wise
generation instead of point-wise generation to better align with industrial short-video
applications. To enhance the stability of session-wise generation, we introduce an
auxiliary Hierarchical Multi-Token Prediction module. Extensive experiments on public
and industrial datasets show significant performance gains over state-of-the-art methods
(21.2% in ML-1M and 35.6% in industrial datasets on NDCG@3). We also conducted visualization
and performance analysis to explore the advantages of long sequence modeling.
VocQuiz: Vocabulary Question Generation for English Language Education
- Yongqi Li
- Jiajun Wu
- Shangqing Tu
- Jifan Yu
- Huiqin Liu
- Lei Hou
- Juanzi Li
Designing effective English vocabulary question generation tools demands a shift from
labor-intensive content creation to large language model (LLM) automation that can
adapt to varied educational contexts. Current approaches tend to offer a limited variety
of question types, which restricts their practical application in real classroom settings.
To better meet the demands of English teaching institutions, we present VocQuiz, a
vocabulary question generation system that 1) combines generalization capabilities
of LLMs with reliable language resources, including dictionaries, NLP datasets and
authentic corpora, to enhance both contextual relevance and linguistic accuracy; 2)
supports multiple question types, such as similar word selection and word collocation,
to accommodate various instructional requirements; and 3) employs an iterative workflow
to iteratively generate and refine questions, ensuring high-quality outputs and consistent
assessment standards. VocQuiz offers a practical, deployable solution that helps educators
create quiz-based instructional materials, reducing preparation effort while effectively
assessing students' mastery of vocabulary.
MHSNet: An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate
Resume Detection with Large Language Model
- Yu Li
- Zulong Chen
- Wenjian Xu
- Hong Wen
- Yipeng Yu
- Manlung Yiu
- Yuyu Yin
To maintain the company's talent pool, recruiters need to continuously search for
resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes
are often incomplete and inaccurate. To improve the quality of third-party resumes
and enrich the company's talent pool, it is essential to conduct duplication detection
between the fetched resumes and those already in the company's talent pool. Such duplication
detection is challenging due to the semantic complexity, structural heterogeneity,
and information incompleteness of resume texts. To this end, we propose MHSNet, an
multi-level identity verification framework that fine-tunes BGE-M3 using contrastive
learning. With the fine-tuned BGE-M3, MHSNet generates multi-level sparse and dense
representations for resumes, enabling the computation of corresponding multi-level
semantic similarities. Moreover, the state-aware Mixture-of-Experts (MoE) is employed
in MHSNet to handle diverse incomplete resumes. Experimental results verify the effectiveness
of MHSNet.
TBGRecall: A Generative Retrieval Model for E-commerce Recommendation Scenarios
- Zida Liang
- Changfa Wu
- Dunxian Huang
- Weiqiang Sun
- Ziyang Wang
- Yuliang Yan
- Jian Wu
- Yuning Jiang
- Bo Zheng
- Ke Chen
- Silu Zhou
- Yu Zhang
Recommendation systems are essential tools in modern e-commerce, facilitating personalized
user experiences by suggesting relevant products. Recent advancements in generative
models have demonstrated potential in enhancing recommendation systems; however, these
models often exhibit limitations in optimizing retrieval tasks, primarily due to their
reliance on autoregressive generation mechanisms. Conventional approaches introduce
sequential dependencies that impede efficient retrieval, as they are inherently unsuitable
for generating multiple items without positional constraints within a single request
session. To address these limitations, we propose TBGRecall, a framework integrating
Next Session Prediction (NSP), designed to enhance generative retrieval models for
e-commerce applications. Our framework reformulation involves partitioning input samples
into multi-session sequences, where each sequence comprises a session token followed
by a set of item tokens, and then further incorporate multiple optimizations tailored
to the generative task in retrieval scenarios. In terms of training methodology, our
pipeline integrates limited historical data pre-training with stochastic partial incremental
training, significantly improving training efficiency and emphasizing the superiority
of data recency over sheer data volume. Our extensive experiments, conducted on public
benchmarks alongside a large-scale industrial dataset from TaoBao, show TBGRecall
outperforms the state-of-the-art recommendation methods, and exhibits a clear scaling
law trend. Ultimately, NSP represents a significant advancement in the effectiveness
of generative recommendation systems for e-commerce applications.
Stratified Expert Cloning for Retention-Aware Recommendation at Scale
- Chengzhi Lin
- Annan Xie
- Shuchang Liu
- Wuhong Wang
- Chuyuan Wang
- Yongq Li
- Han Li
User retention is critical in large-scale recommender systems, significantly influencing
online platforms' long-term success. Existing methods typically focus on short-term
engagement, neglecting the evolving dynamics of user behaviors over time. Reinforcement
learning (RL) methods, though promising for optimizing long-term rewards, face challenges
like delayed credit assignment and sample inefficiency. We introduce Stratified Expert
Cloning (SEC), an imitation learning framework that leverages abundant interaction
data from high-retention users to learn robust policies. SEC incorporates: 1) multi-level
expert stratification to model diverse retention behaviors; 2) adaptive expert selection
to dynamically match users with appropriate policies based on their state and retention
history; and 3) action entropy regularization to enhance recommendation diversity
and policy generalization. Extensive offline evaluations and online A/B tests on major
video platforms (Kuaishou and Kuaishou Lite) with hundreds of millions of users validate
SEC's effectiveness. Results show substantial improvements, achieving cumulative lifts
of 0.098% and 0.122% in active days on the two platforms respectively, each translating
into over 200,000 additional daily active users.
GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token
Prediction
- Zhijie Lin
- Zhuofeng Li
- Chenglei Dai
- Wentian Bao
- Shuai Lin
- Enyun Yu
- Haoxiang Zhang
- Liang Zhao
In a multi-stage recommendation system, reranking plays a crucial role in modeling
intra-list correlations among items. A key challenge lies in exploring optimal sequences
within the combinatorial space of permutations. Recent research follows a two-stage
(generator-evaluator) paradigm, where a generator produces multiple feasible sequences,
and an evaluator selects the best one. In practice, the generator is typically implemented
as an autoregressive model. However, these two-stage methods face two main challenges.
First, the separation of the generator and evaluator hinders end-to-end training.
Second, autoregressive generators suffer from inference efficiency. In this work,
we propose a Unified Generative Efficient Reranking Framework (GReF) to address the
two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive
generator featuring a bidirectional encoder and a dynamic autoregressive decoder to
generate causal reranking sequences. Subsequently, we pre-train Gen-Reranker on the
item exposure order for high-quality parameter initialization. To eliminate the need
for the evaluator while integrating sequence-level evaluation during training for
end-to-end optimization, we propose post-training the model through Rerank-DPO. Moreover,
for efficient autoregressive inference, we introduce ordered multi-token prediction
(OMTP), which trains Gen-Reranker to simultaneously generate multiple future items
while preserving their order, ensuring practical deployment in real-time recommender
systems. Extensive offline experiments demonstrate that GReF outperforms state-of-the-art
reranking methods while achieving latency that is nearly comparable to non-autoregressive
models. Additionally, GReF has also been deployed in a real-world video app Kuaishou
with over 300 million daily active users, significantly improving online recommendation
quality.
AutoDW-TS: Automated Data Wrangling for Time-Series Data
- Lei Liu
- So Hasegawa
- Shailaja Keyur Sampat
- Mehdi Bahrami
- Wei-Peng Chen
- Kodai Toyota
- Takashi Kato
- Takumi Akazaki
- Akira Ura
- Tatsuya Asai
Data wrangling - the process of preparing raw data for analysis through cleansing,
transformation, and enrichment - is a critical step in the data science pipeline.
Its importance is amplified for time-series data, which underpins many applications,
with forecasting being one of the most prominent tasks. Yet, current practices remain
largely manual, time-consuming, and error-prone, limiting productivity and scalability.
In this paper, we introduce AutoDW-TS, an automated approach to time-series data wrangling
powered by Large Language Models (LLMs). Our method offers an end-to-end pipeline,
automating key stages such as table merging, prediction engineering, cleansing, imputation,
and enrichment. To support diverse use cases, we developed multiple systems, including
an interactive AutoDW-TS WebApp, Web APIs, and an AI agent. We share insights from
developing and deploying these systems, along with results from an extensive evaluation
across 38 time-series benchmarks. Our findings show that AutoDW-TS significantly improves
forecasting performance, demonstrating its effectiveness and potential to transform
time-series data preparation at scale.
Prompt Tuning as User Inherent Profile Inference Machine
- Yusheng Lu
- Zhaocheng Du
- Xiangyang Li
- Pengyue Jia
- Yejing Wang
- Weiwen Liu
- Yichao Wang
- Huifeng Guo
- Ruiming Tang
- Zhenhua Dong
- Yongrui Duan
- Xiangyu Zhao
Large Language Models (LLMs) have exhibited significant promise in recommender systems
by empowering user profiles with their extensive world knowledge and superior reasoning
capabilities. However, LLMs face challenges like unstable instruction compliance,
modality gaps, and high inference latency, leading to textual noise and limiting their
effectiveness in recommender systems. To address these challenges, we propose UserIP-Tuning,
which uses prompt-tuning to infer user profiles. It integrates the causal relationship
between user profiles and behavior sequences into LLMs' prompts. It employs Expectation
Maximization (EM) to infer the embedded latent profile, minimizing textual noise by
fixing the prompt template. Furthermore, a profile quantization codebook bridges the
modality gap by categorizing profile embeddings into collaborative IDs pre-stored
for online deployment. This improves time efficiency and reduces memory usage. Experiments
show that UserIP-Tuning outperforms state-of-the-art recommendation algorithms. An
industry application confirms its effectiveness, robustness, and transferability.
The presented solution has been deployed in Huawei AppGallery's Explore page since
May 2025, serving 2 million daily active users, delivering significant improvements in real-world recommendation scenarios. The code
is publicly available for replication at https://github.com/Applied-Machine-Learning-Lab/UserIP-Tuning.
TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance
- Weiqing Luo
- Chonggang Song
- Lingling Yi
- Gong Cheng
Combining semantic information with behavioral data is a crucial research area in
recommender systems. A promising approach involves leveraging external knowledge to
enrich behavioral-based recommender systems with abundant semantic information. However,
this approach faces two primary challenges: (1) denoising raw external knowledge and
(2) adapting semantic representations. To address these challenges, we propose exTernal
knowledge-enhanced RecommendAtion With LLM assistance (TRAWL). This method utilizes
large language models to extract relevant recommendation knowledge from raw external
data and employs a contrastive learning strategy for adapter training. Experiments
on public datasets and real-world online recommender systems validate the effectiveness
of our approach.
QARM: Quantitative Alignment Multi-Modal Recommendation at Kuaishou
- Xinchen Luo
- Jiangxia Cao
- Tianyu Sun
- Jinkai Yu
- Rui Huang
- Wei Yuan
- Hezheng Lin
- Yichen Zheng
- Shiyao Wang
- Qigen Hu
- Changqing Qiu
- Jiaqi Zhang
- Xu Zhang
- Zhiheng Yan
- Jingming Zhang
- Simin Zhang
- Mingxing Wen
- Zhaojie Liu
- Guorui Zhou
In recent years, with the significant evolution of multi-modal large models, many
recommender researchers realized the potential of multi-modal information for user
interest modeling. In industry, a wide-used modeling architecture is a cascading paradigm:
(1) first pre-training a multi-modal model to provide omnipotent representations for
downstream services; (2) The downstream recommendation model takes the multi-modal
representation as additional input to fit real user-item behaviours. Although such
paradigm achieves remarkable improvements, however, there still exist two problems
that limit model performance: (1) Representation Unmatching: The pre-trained multi-modal model is always supervised by the classic NLP/CV tasks,
while the recommendation models are supervised by real user-item interaction. As a
result, the two fundamentally different tasks' goals were relatively separate, and
there was a lack of consistent objective on their representations; (2) Representation Unlearning: The generated multi-modal representations are always stored in cache store and serve
as extra fixed input of recommendation model, thus could not be updated by recommendation
model gradient, further unfriendly for downstream training.
Inspired by the two difficulties challenges in downstream tasks usage, we introduce
a quantitative multi-modal framework to customize the specialized and trainable multi-modal
information for different downstream models. Specifically, we introduce two insightful
modifications to enhance above framework: (1) Item Alignment to transform the original multi-modal representations to match the real user-item
behaviours distribution. (2) Quantitative Code to transform the aligned multi-modal representations to trainable code ID for downstream
tasks. We conduct detailed experiments and ablation analyses to demonstrate our QARM
effectiveness. Our method has been deployed on Kuaishou's various services, serving
400 million users daily.
SMTIR: Scenario-Aware Multi-Trigger Induction Network for CTR Prediction
- Xuan Ma
- Yu Shi
- Hao Peng
- Jia Duan
- Zhanhao Ye
- Kunyao Wang
- Kai Yan
- Long Chen
- Zehua Zhang
- Changping Peng
- Zhangang Lin
- Ching Law
Trigger-Induced Recommendation (TIR), which aims to predict user interest based on
a trigger item, has gained considerable traction on e-commerce platforms. Current
TIR methods typically analyze user intent by integrating explicit interest in the
trigger item and implicit interest derived from user historical behaviors. However,
these methods often overlook the contextual information and occurring scenarios related
to the trigger, resulting in an undue emphasis on isolated trigger items and a consequently
restrictive understanding of users' short-term intentions. To address these challenges,
we propose a novel scenario-aware multi-trigger induction method featuring three key
enhancements: (1) The Context Modeling Network learns contextual information associated
with the trigger during the request, improving the understanding of users' real intentions
regarding the trigger item; (2) The Multi-Trigger Learning Network introduces user
latent triggers from various scenarios to uncover users' potential external preferences;
(3) The Scenario Induction Network captures the characteristics of the scenarios in
which triggers occur and performs induction to yield scenario-aware user intentions
prediction. We validate our approach through experiments on multiple industrial datasets,
demonstrating the model's effectiveness. Furthermore, we have integrated the model
into an online advertising system, achieving a 5.46% improvement in Click-Through
Rate (CTR).
IclForge: Enhancing In-Context Learning with Evolutionary Algorithms under Budgeted
Annotation
- Vijit Malik
- Atul Pande
- Anirban Majumder
In-context learning (ICL) has emerged as a powerful paradigm for adapting Large Language
Models (LLMs) to specific tasks without parameter updates. While various strategies
exist for selecting relevant ICL exemplars from a labeled pool, the fundamental challenge
of constructing this high-quality pool remains largely unexplored, especially for
new tasks or domains with limited labeled data. We present IclForge, a novel active learning framework that efficiently selects informative examples
from unlabeled datasets to be annotated and included in the ICL pool. Unlike traditional
active learning methods that optimize for individual example informativeness, IclForge explicitly considers the interdependence of examples within the ICL context. Through
extensive experiments across diverse datasets and LLM architectures, we show that
IclForge outperforms standard active learning baselines by +180-450 basis points while requiring
50% fewer annotations. Our framework is complementary to existing ICL selection strategies
and extends naturally to generative applications, which we demonstrate through experiments
on Math Word Problem (MWP) tasks. These results highlight IclForge's effectiveness in constructing high-quality ICL exemplar pools in resource-constrained
scenarios.
Next-Generation Price Recommendation with LLM-Augmented Graph Transformers
- Hadi Mohammadzadeh Abachi
- Amin Beheshti
- Milad Mosharraf
- Pooyan Asgari
- Majid Namazi
Dynamic pricing on two-sided platforms such as Airbnb presents complex challenges
due to the heterogeneity of listings, user behaviours, and contextual variables. In
this work, we propose a robust and interpretable pricing framework that leverages
Large Language Models (LLMs) and prompt engineering to automate the generation of
high-level meta-features from unstructured and structured listing data. These meta-features
are designed to capture nuanced semantic features that are often overlooked by traditional
feature engineering pipelines. We further integrate these representations into a Transformer-based
Graph Neural Network (GNN), which models the relational and spatial dependencies between
listings in a data-driven and several relation-construction manner. By combining prompt-driven
embeddings with graph-aware contextual learning, our framework significantly enhances
price recommendation accuracy while offering transparency through assortativity analysis.
Extensive experiments on real-world Airbnb datasets demonstrate our approach's performance
in both prediction and unseen data across neighbourhoods and output interpretability.
This work highlights the potential of unifying LLMs, structured graph learning, and
interpretable AI for next-generation dynamic pricing systems.
On the Gap Between Diffusion and Transformer Multi-Tabular Generation
- Gijs Paardekooper
- Jeroen M. Galjaard
- Lydia Y. Chen
Shareable tabular data is of high importance in industry and research. While generating
synthetic records is well-studied, research has only recently extended to relational data synthesis. In the tabular generation setting, diffusion and transformer models exhibit superior
performance over prior art. However, in the relational setting, diffusion models outperform
transformers. This work focuses on the performance gap between tabular transformers
and diffusion models in single (tabular) and multi-table (relational) settings, using
REaLTabformer and ClavaDDPM as representative state-of-the-art models. We evaluate
these architectures on a set of single- and multi-table datasets, highlighting the
gap's root causes between the methods. In our experiments, we attribute this difference
to the influence of contextual information and data representation. To bridge the
gap in the relational setting, we propose two seemingly simple strategies: layer sharing
and contextual cues. This work1 offers insights into key design considerations for
single- and multitable generative models, including the incorporation of contextual
information and the reuse of existing knowledge. With the proposed methods, we achieve
improvements of 1.52× and 1.94× for the Logistic Detection and Discriminator Measure
metrics, respectively.
MOHPER: Multi-objective Hyperparameter Optimization Framework for E-commerce Retrieval
System
- Jungbae Park
- Heonseok Jang
E-commerce retrieval and ranking optimization have expanded to incorporate broader
metrics that capture user engagement and business objectives. Modern search frameworks
now incorporate advanced quality features, such as sales counts and document-query
relevance, to better align search results with these goals. Traditional methods typically
focus on click-through rate (CTR) as a measure of engagement or relevance, but this
can miss true purchase intent, creating a gap between user interest and actual conversions.
Joint training with the click-through conversion rate (CTCVR) has become essential
for understanding buying behavior, although its sparsity poses challenges for reliable
optimization. This study presents MOHPER, a Multi-Objective HyperParameter optimization framework for E-commerce Retrieval systems. Using Bayesian optimization and sampling, it jointly optimizes CTR,
CTCVR, and other relevant objectives, with a focus on user engagement and conversion.
To enhance configuration selection in multi-objective optimization, we propose advanced
hyperparameter selection methods, including a meta-configuration voting strategy and
a cumulative training approach that leverages prior optima to improve training efficiency
and performance. Currently deployed in a live setting, our proposed framework substantiates
its practical efficacy in achieving a balanced optimization that aligns with both
user satisfaction and revenue goals.
Expert-Guided Diffusion Planner for Auto-Bidding
- Yunshan Peng
- Wenzheng Shu
- Jiahao Sun
- Yanxiang Zeng
- Jinan Pang
- Wentao Bai
- Yunke Bai
- Xialong Liu
- Peng Jiang
Auto-bidding is widely used in advertising systems, serving a diverse range of advertisers.
Generative bidding is increasingly gaining traction due to its strong planning capabilities
and generalizability. Unlike traditional reinforcement learning-based bidding, generative
bidding does not depend on the Markov Decision Process (MDP), thereby exhibiting superior
planning performance in long-horizon scenarios. Conditional diffusion modeling approaches
have shown significant promise in the field of auto-bidding. However, relying solely
on return as the optimality criterion is insufficient to guarantee the generation
of truly optimal decision sequences, as it lacks personalized structural information.
Moreover, the auto-regressive generation mechanism of diffusion models inherently
introduces timeliness risks. To address these challenges, we introduce a novel conditional
diffusion modeling approach that integrates expert trajectory guidance with a skip-step
sampling strategy to improve generation efficiency. The efficacy of this method has
been demonstrated through comprehensive offline experiments and further substantiated
by statistically significant outcomes in online A/B testing, yielding an 11.29% increase
in conversions and a 12.36% growth in revenue relative to the baseline.
Thematic Bottleneck Models for Multimodal Analysis of School Attendance
- Tingrui Qiao
- Caroline Walker
- Chris Cunningham
- Adam Jang-Jones
- Susan Morton
- Kane Meissel
- Yun Sing Koh
Regular school attendance is critical for young people, supporting academic achievement,
social development, and the cultivation of lifelong habits. Existing research for
analysing attendance patterns often relies on structured survey data targeted at their
parents and teachers, which overlooks students' perspectives and experiences. To address
this gap, our team developed and deployed the Our Journey platform, which enables young people to share their experiences through multimodal
responses such as texts and images, offering unique insights into the factors influencing
school attendance. The data is linked to official attendance records from the Ministry of Education, allowing the modelling of attendance outcomes based on students' input. To effectively
analyse the data, we propose Thematic Bottleneck Models (TBMs) to enhance the understanding
of subjective experiences behind data and the interpretability of attendance modelling.
TBMs introduce qualitative concepts as intermediate labels, mapping multimodal data
to qualitative insights from thematic analysis before the outcomes. The attendance
modelling with TBMs outperforms existing multimodal methods in predicting attendance
percentage and persistent absenteeism. Analysis of themes within TBMs reveals motivational
and contextual factors associated with regular attendance and persistent absenteeism.
The findings are used to inform education policy and guide strategies to support student
engagement in New Zealand.
Spatial Semantic-based Enhanced Address Parsing via Adaptive Weighted Learning
- Huiling Qin
- Ming Wang
- Yuanxun Li
- Junbo Zhang
- Yu Zheng
Address parsing is an essential task that transforms natural language descriptions
into standardized addresses, crucial for numerous urban applications. Existing methods
struggle with ambiguous expressions, and even Large Language Models face challenges
adapting to specialized domains with limited data. In this study, we focus on developing
a robust framework to map diverse address descriptions into a unified semantic space
of standardized addresses. We propose the Adaptive Weighted Learning-based Address
Parsing (AWLAP) framework, which enhances parsing effectiveness through two key components:
a multi-level constrained classifier that mines correlations between geographic entities
across hierarchies, and an integrated discriminator that adaptively guides optimization
based on parsing complexity. We evaluate the AWLAP using real data from JD Logistics
and Point-of-Interest addresses. Extensive experiments comparing against state-of-the-art
methods demonstrate AWLAP's effectiveness and robustness in address parsing. The proposed
AWLAP framework has been successfully deployed as an address parsing service in practical
applications.
Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora
- Edward Raff
- Ryan R. Curtin
- Derek Everett
- Robert J. Joyce
- James Holt
A classifier using byte n-grams as features is the only approach we have found fast
enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency
(sub 10 ms) for deployment in numerous malware detection scenarios. However, we've
consistently found that 6-8 grams achieve the best accuracy on our production deployments
but have been unable to deploy regularly updated models due to the high cost of finding
the top-k most frequent n-grams over terabytes of executable programs. Because the
Zipfian distribution well models the distribution of n-grams, we exploit its properties
to develop a new top-k n-gram extractor that is up to 35× faster than the previous
best alternative. Using our new Zipf-Gramming algorithm, we are able to scale up our
production training set and obtain up to 30% improvement in AUC at detecting new malware.
We show theoretically and empirically that our approach will select the top-k items
with little error and the interplay between theory and engineering required to achieve
these results.
HuggingGraph: Understanding the Supply Chain of LLM Ecosystem
- Mohammad Shahedur Rahman
- Peng Gao
- Yuede Ji
Large language models (LLMs) leverage deep learning architectures to process and predict
sequences of words, enabling them to perform a wide range of natural language processing
tasks, such as translation, summarization, question answering, and content generation.
As existing LLMs are often built from base models or other pre-trained models and
use external datasets, they can inevitably inherit vulnerabilities, biases, or malicious
components that exist in previous models or datasets. Therefore, it is critical to
understand these components' origin and development process to detect potential risks,
improve model fairness, and ensure compliance with regulatory frameworks. Motivated
by that, this project aims to study such relationships between models and datasets,
which are the central parts of the LLM supply chain. First, we design a methodology
to systematically collect LLMs' supply chain information. Then, we design a new graph
to model the relationships between models and datasets, which is a directed heterogeneous
graph, having 402,654 nodes and 462,524 edges. Lastly, we perform different types
of analysis and make multiple interesting findings.
Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global
Context for Map Inference
- Yudong Shen
- Jiali Mao
- Wenyu Wu
- Yixiao Tong
- Guoping Liu
- Chaoya Wang
Trajectory data has become a key resource for automated map inference due to its low
cost, broad coverage, and continuous availability. However, uneven trajectory density
often leads to fragmented roads in sparse areas and redundant segments in dense regions,
posing significant challenges for existing methods. To address these issues, we propose
DGMap, a dual-decoding framework with global context awareness, featuring Multi-scale Grid Encoding, Mask-enhanced Keypoint Extraction, and Global Context-aware Relation Prediction. By integrating global semantic context with local geometric features, DGMap improves keypoint detection accuracy to reduce road fragmentation in sparse-trajectory
areas. Additionally, the Global Context-aware Relation Prediction module suppresses false connections in dense-trajectory regions by modeling long-range
trajectory patterns.Experimental results on three real-world datasets show that DGMap outperforms state-of-the-art methods by 5% in APLS, with notable performance gains on trajectory data from the Didi Chuxing platform.
LinkML for Collaborative Petrochemical Knowledge Graph Development
- Annie Shoup
- Adam Russell
- Gavin Nicol
LinkML is an emerging ontology modeling framework using YAML syntax instead of the
typical semantic web technologies such as OWL. LinkML's approachable syntax allows
a wide range of stakeholders-including domain experts, software engineers, and data
scientists-to collaboratively define and refine a shared semantic model. We leveraged
LinkML to develop a 1,200+ class ontology capturing complex physical infrastructure,
emissions sources, equipment relationships, and time-series observations critical
to source-level emissions attribution.
We developed tooling to automate transformation of LinkML into graph database schemas,
entity-relationship (ER) diagrams, documentation, and typed domain model code, supporting
rapid iteration and semantic consistency across systems. The resulting large-scale
knowledge graph, deployed in a property graph database, materializes digital twins
for over 700 petrochemical facilities and more than 61,000 pieces of equipment.
This approach dramatically simplifies emissions analytics by standardizing data integration
and replacing fragile, bespoke SQL logic with intuitive graph queries. The accessibility
of LinkML enables domain experts to directly contribute to the model, while providing
a robust foundation for engineers and data scientists to perform scalable, reliable
analytics. The result is a unified platform for emissions reporting and digital transformation.
This paper presents the use of LinkML for semantic modeling for emissions reporting
in the petrochemical industry, the tooling supporting its deployment, and the challenges
and successes of the approach.
AutoCoRe-FL: Automatic Concept-based Rule Reasoning in Federated Learning
- Ahmed Soliman
- Radwa El Shawi
Federated learning (FL) enables decentralized model training without centralizing
raw data, yet achieving interpretability under such constraints remains challenging.
We propose AutoCoRe-FL, a framework for interpretable FL that eliminates the need
for predefined or manually labeled concepts. In AutoCoRe-FL, each client automatically
extracts high-level visual concepts-clusters of semantically coherent image regions
that correspond to human-understandable properties-using local segmentation, self-supervised
representation learning, and clustering. These concepts are used to encode data as
interpretable vectors, from which clients train symbolic models that generate rule-based
explanations. The server then aggregates these rules through an iterative, communication-efficient
process to build a global, coherent, and transparent model. Experiments on benchmark
datasets demonstrate that AutoCoRe-FL produces accurate symbolic explanations while
achieving competitive predictive performance. Notably, it outperforms LR-XFL-the current
state-of-the-art interpretable FL baseline that relies on predefined concept supervision-in
both rule quality and classification accuracy.
PRECISE: Pre-training and Fine-tuning Sequential Recommenders with Collaborative and
Semantic Information
- Chonggang Song
- Chunxu Shen
- Hao Gu
- Yaoming Wu
- Lingling Yi
- Jie Wen
- Chuan Chen
Recommendation platforms commonly offer diverse content scenarios for users to interact
with. Pre-training models are the most commonly used approach in recommendation systems
to capture users' full-domain interests. Traditional ID-based pre-training models
mainly capture user interests by leveraging collaborative signals. However, a prevalent
drawback of those systems is the incapacity to handle cold-start scenarios. With the
recent advent of large language models, there has been a significant increase in research
efforts exploiting LLMs to extract semantic information for items. However, text-based
recommendations highly rely on elaborate feature engineering and often fail to capture
collaborative similarities.
To overcome these limitations, we propose a novel pre-training and fine-tuning framework
for sequential recommendation, termed Precise. Precise employs a pre-training framework
that models users' comprehensive interests across all recommendation scenarios combining
collaborative signals with semantic information. To address rapid shifting data distributions
in recommendation scenarios, we further propose a fine-tuning phase tailored to specific
target scenarios/tasks, thereby achieving efficient industrial deployment while maintaining
fast responsiveness. Additionally, we introduce practical training strategies to enhance
the model's performance in real-world applications. Empirical findings reveal that
the Precise framework attains outstanding performance in both offline experiments
and online A/B tests. Precise has been fully deployed in multiple online recommendation
scenarios in WeChat.
LinkedIn Post Embeddings: Industrial Scale Embedding Generation and Usage across LinkedIn
- Sudarshan Srinivasa Ramanujam
- Akanksha Bindal
- Yu Jiang
- Timothy J. Hazen
- David Golland
- Fengyu Zhang
- Daqi Sun
- Wanning Li
- Birjodh Singh Tiwana
- Siddharth Dangi
- Peng Yan
A post embedding (representation of text in embedding space that effectively captures
semantic meaning) is a foundational component of LinkedIn that is consumed by product
surfaces in retrieval and ranking (e.g., ranking posts in the feed or video tab).
This paper presents the post embeddings used at LinkedIn, where a pre-trained transformer-based
large language model (LLM) is taken as input and fine-tuned using multi-task learning
across a diverse set of semantic labeling tasks. We observe positive transfer, leading
to improved performance across all tasks, compared to training them independently.
The generated post embeddings outperform baseline models in zero-shot learning, demonstrating
its potential for broader applicability. Furthermore, the generated post embeddings'
performance surpasses that of OpenAI's ADA-001 and ADA-002 embeddings on LinkedIn
specific datasets and tasks. We also describe the offline evaluation methodology and
the deployment to our near-line infrastructure, which makes the post embedding available
for use within minutes of post creation for any downstream application. We present
how the embeddings were applied in the Feed product surface, in both ranking and retrieval
stages, and showcase the real world online impact to demonstrate the superior performance
of these embeddings. Finally, we also share the results of applying the embeddings
to the retrieval system of our video ranking product surface in LinkedIn. These embeddings
have been battle-tested in production at LinkedIn for over two years, consistently
powering multiple products.
Towards Explainable Transaction Risk Analysis With Dual Graph Retrieval Augmented
Generation
- Liang Su
- Mingyang Zhang
- Kangxiang Jia
- Tengfei Liu
- Weiqiang Wang
- Yun Xiong
- Xixi Wu
- Xinyu Gao
- Yongrui Fu
- Jiawei Zhang
Explainable transaction risk analysis is a challenge for traditional deep learning
models, which only predict suspicious transactions without explanations. Current explainable
methods rely on hand-crafted rules and lack the ability to automatically generate
language-based explanations. Large Language Models (LLMs) offer promise due to their
reasoning and text generation abilities but struggle with domain knowledge and hallucinations,
making risk analysis difficult. Specifically, LLMs face: (1) insufficient adaptation to transaction data analysis, and (2) ineffective knowledge retrieval methods that ignore the rich graph structure of transaction data. To address these issues,
we propose the Dual Graph Retrieval-Augmented Generation (Dual-gRAG) framework, which utilizes dual retrieval: expert knowledge and
reasoning case retrieval. Expert knowledge compensates for domain gaps, while reasoning
case retrieval provides step-wise analysis guidance. We incorporate both graph-structured
features and semantic features into the retrieval process to enhance the effectiveness
of the retrieval. Extensive experiments show that Dual-gRAG improves LLMs' risk analysis
capabilities, achieving a 15% increase in different metrics.
Learning to Comparison-Shop
- Jie Tang
- Daochen Zha
- Xin Liu
- Huiji Gao
- Liwei He
- Stephanie Moyerman
- Sanjeev Katariya
In online marketplaces like Airbnb, users frequently engage in comparison shopping
before making purchase decisions. Despite the prevalence of this behavior, a significant
disconnect persists between mainstream e-commerce search engines and users' comparison
needs. Traditional ranking models often evaluate items in isolation, disregarding
the context in which users compare multiple items on a search results page. While
recent advances in deep learning have sought to improve ranking accuracy, diversity,
and fairness by encoding listwise context, the challenge of aligning search rankings
with user comparison shopping behavior remains inadequately addressed. In this paper,
we propose a novel ranking architecture - Learning-to-Comparison-Shop (LTCS) System
- that explicitly models and learns users' comparison shopping behaviors. Through
extensive offline and online experiments, we demonstrate that our approach yields
statistically significant gains in key business metrics - improving NDCG by 1.7% and
boosting booking conversion rate by 0.6% in A/B testing - while also enhancing user
experience. We also compare our model against state-of-the-art approaches and demonstrate
that LTCS significantly outperforms them.
Locale-Aware Product Type Prediction for E-commerce Search Queries
- Anna Tigunova
- Thomas Ricatte
- Ghadir Eraisha
Search query understanding (QU) is an important building block of the modern e-commerce
search engines. QU extracts multiple intents from customer queries, including intended
color, brand, etc. One of the most important tasks in QU is predicting which product
category the user is interested in. In our work we are tapping into query product
type classification (Q2PT) task. Compared to classification of full-fledged texts,
Q2PT is more complicated because of the ambiguity of short search queries, which is
aggravated by language and cultural differences in worldwide online stores. Moreover,
the span and variety of product categories in modern marketplaces pose a significant
challenge. We focus on Q2PT inference in the global multi-locale e-commerce markets,
which need to deliver high quality user experience in both large and small local stores
alike. The common approach of training Q2PT models for each locale separately shows
significant performance drops in low-resource stores and prevents from easily expanding
to a new country, where the Q2PT model has to be created from scratch. We use transfer
learning to address this challenge, augmenting low-resource locales through the vast
knowledge of the high-resource ones. We introduce a unified, locale-aware Q2PT model,
sharing training data and model structure across worldwide stores. We show that the
proposed unified locale-aware Q2PT model has superior performance over the alternatives
by conducting extensive quantitative and qualitative analysis on the large-scale multilingual
e-commerce dataset across 20 worldwide locales. Our online A/B tests have shown that
using locale-aware model improves over the previous user experience, increasing customer
satisfaction.
GeoIndia V2: A Unified Graph and Language Model for Context-Aware Geocoding
- Arpit Tiwari
- Bhavuk Singhal
- Anshu Aditya
- Shubham Jain
- Debashis Mukherjee
- Debdoot Mukherjee
Geocoding in India presents unique challenges due to the unstructured, multilingual
and diverse nature of its address systems. While recent advances in geospatial AI
have explored the combination of spatial and semantic cues, existing methods often
fall short in effectively integrating both dimensions for robust address resolution.
In this work, we propose GeoIndia-V2, an enhanced version of GeoIndia [21], that unifies
geospatial and semantic modeling through a novel fusion framework. Our unified model
combines the Graphormer architecture [27] and a Pre-trained Transformer based Language
Model (PTLM) that is trained from scratch on proprietary Indian address data, using
our proposed Key Modulated Cross-Attention (KMCA) mechanism. KMCA enables deep cross-modal
interaction between geospatial topology and linguistic structure and allows the model
to reason contextually across both geographic and textual dimention, effectively handling
the semantic intricacies of Indian addresses-including colloquial usage, inconsistent
formatting, and multilinguality. We leverage last-mile e-commerce delivery data to
construct a fine-grained graph of neighbourhood connectivity, enabling Graphormer
to capture rich spatial relationships. Unlike prior methods that rely on self-loops,
we generate graphs dynamically at inference time to exploit Graphormer's topological
strength. Additionally, we introduce a generative decoding strategy for predicting
hierarchical H3 cells. https://www.uber.com/en-IN/blog/h3/, moving beyond conventional
bit-wise classification approaches. To the best of our knowledge, this is the first
method to explicitly fuse graph-based geospatial learning with language-driven semantic
modeling via cross-attention in the Indian geocoding context. Our approach significantly
outperforms existing solutions and marks a substantial advancement toward building
scalable real-world geocoding systems for complex address ecosystems like India.
End-to-end Information Extraction from Archival Records with Multimodal Large Language
Models
- Mahsa Vafaie
- Sven Hertling
- Inger Banse-Strobel
- Kevin Dubout
- Harald Sack
Semi-structured Document Understanding presents a challenging research task due to
the significant variations in layout, style, font, and content of documents. This
complexity is further amplified when dealing with born-analogue historical documents, such as digitised archival records, which contain degraded
print, handwritten annotations, stamps, marginalia and inconsistent formatting resulting
from historical production and digitisation processes. Traditional approaches for
extracting information from semi-structured documents rely on manual labour, making
them costly and inefficient. This is partly due to the fact that within document collections,
there are various layout types, each requiring customised optimisation to account
for structural differences, which substantially increases the effort needed to achieve
consistent quality. The emergence of Multimodal Large Language Models (MLLMs) has
significantly advanced Document Understanding by enabling flexible, prompt-based understanding
of document images, needless of OCR outputs or layout encodings. Moreover, the encoder-decoder
architectures have overcome the limitations of encoder-only models, such as reliance
on annotated datasets and fixed input lengths. However, there still remains a gap
in effectively applying these models in real-world scenarios. To address this gap,
we first introduce BZKOpen, a new annotated dataset designed for key information extraction
from historical German index cards. Furthermore, we systematically assess the capabilities
of several state-of-the-art MLLMs-including the open-source InternVL2.0 and InternVL2.5
series, and the commercial GPT-4o-mini-on the task of extracting key information from
these archival documents. Both zero-shot and few-shot prompting strategies are evaluated
across different model configurations to identify the optimal conditions for performance.
Interestingly, our results reveal that increasing model size does not necessarily
lead to better performance on this dataset. Among all models tested, the open-source
InternVL2.5-38B consistently achieves the most robust results, outperforming both
larger InternVL models and the proprietary alternative. We further provide practical
insights into prompt engineering and inference settings, offering guidance for applying
MLLMs to real-world key information extraction tasks. Additionally, we highlight the
need for more ground truth datasets that include a wider range of historical documents
with varying quality and in multiple languages, in order to fully explore the potentials
and limitations of MLLMs for key information extraction from historical records.
DinoCompanion: An Attachment-Theory Informed Multimodal Robot for Emotionally Responsive
Child-AI Interaction
- Boyang Wang
- Yuhao Song
- Jinyuan Cao
- Peng Yu
- Hongcheng Guo
- Zhoujun Li
Emotional development of children fundamentally relies on secure attachment relationships,
yet current AI companions lack the theoretical foundation to provide developmentally
appropriate emotional support. We introduce DinoCompanion, the first attachment-theory-grounded
multimodal robot for emotionally responsive child-AI interaction. We address three
critical challenges in child-AI systems: the absence of developmentally-informed AI
architectures, the need to balance engagement with safety, and the lack of standardized
evaluation frameworks for attachment-based capabilities. Our contributions include:
(i) a multimodal dataset of 128 caregiver-child dyads containing 125,382 annotated
clips with paired preference-risk labels, (ii) CARPO (Child-Aware Risk-calibrated
Preference Optimization), a novel training objective that maximizes engagement while
applying epistemic-uncertainty-weighted risk penalties, and (iii) AttachSecure-Bench,
a comprehensive evaluation benchmark covering ten attachment-centric competencies
with strong expert consensus. AttachSecure-Bench achieves state-of-the-art performance
(57.15%), outperforming GPT-4o and Gemini-2.5-Pro, with exceptional secure base behaviors
and superior attachment risk detection. Ablations validate the critical importance
of multimodal fusion, uncertainty-aware risk modeling, and hierarchical memory for
coherent, emotionally attuned interactions.
SolarMAE: A Unified framework for Regional Centralized and Distributed Solar Power
Forecasting with Weather Pre-training
- Jin Wang
- Bingqing Peng
- Wenwei Wang
- Yuanjie Hu
- Yuejiang Chen
- Peisong Niu
- Liang Sun
The recent surge in solar plant installations has notably decreased the reliance on
fossil fuels while also presenting significant challenges to power grid. Therefore,
the accurate forecasting of centralized and distributed solar power has become critically
important. Although site-specific forecasting models typically perform better for
utility-scale solar power plants, the model maintenance can be troublesome as the
number of solar plants grows. Furthermore, the rapid growth and difficulties in real-time
data collection associated with distributed solar systems exacerbate the complexity
of regional gross solar power forecasting. To address these issues, we propose SolarMAE, a unified regional solar power forecasting framework enabling end-to-end precise
forecasting for both centralized and distributed solar systems. It adopts masked autoencoder
(MAE) pre-training strategy for numerical weather prediction (NWP) reconstruction
at first, aiming to derive spatiotemporal correlations within meteorological variables,
and then fine-tunes a temporal convolutional neural network which predicts future
solar power generation. Experiments show that this framework outperforms state-of-the-art
centralized or distributed solar power forecasting methods in accuracy, and significantly
reduces model maintenance cost. It also demonstrates strong few-shot learning capabilities,
which is particularly useful for the cold start problem of newly installed solar plants.
The unified solar power forecasting system has been deployed in a province in eastern
China, serving solar systems with over 73 GW gross installed capacity and more than
400 centralized solar plants.
GCVPN: A Graph Convolutional Visual Prior-Transform Network for Actual Occluded Image
Recognition
- Lei Wang
- Nannan Wu
- Huaming Wu
- Wei Yu
- Fan Zhang
- Shuo Chen
Image recognition plays a critical role in urban security, traffic management, and
environmental monitoring, yet achieving high accuracy in obstructed scenes remains
a challenge. To address this, we propose a Graph Convolutional Visual Prior-Transform
Network (GCVPN), which significantly improves recognition accuracy and efficiency
in complex environments. GCVPN introduces an image prior slicing and topology transformer
to convert image data into graph-structured slice features, integrating domain overlap
sampling and planar mapping to handle symmetry and enable precise, rapid anomaly detection.
By combining a traditional VGG backbone with graph convolutional layers, GCVPN jointly
captures topological relationships and feature semantics, while maintaining real-time
efficiency with continuous recognition at 30 video frames per second. Extensive experiments
demonstrate its effectiveness in photovoltaic panel anomaly detection and face occlusion
recognition, highlighting strong potential for applications in intelligent surveillance
and autonomous driving.
Fraudulent Delivery Detection with Multimodal Courier Behavior Data in Last-Mile Delivery
- Shanshan Wang
- Sijing Duan
- Shuxin Zhong
- Zhiqing Hong
- Zhiyuan Zhou
- Hongyu Lin
- Weijian Zuo
- Desheng Zhang
- Yi Ding
The rapid growth of e-commerce has made last-mile delivery a critical service in daily
life. Despite regulations mandating doorstep delivery, the pressure of penalties for
delays can lead to fraudulent delivery behaviors, where couriers may report package
receipt without actually deliver the package to assigned locations. Existing studies
on fraud behavior detection focus on exploring user (courier) behaviors for fraud
behavior detection. However, due to the inaccuracy of GPS positioning and the variability
of user behavior patterns caused by dynamic environmental factors, relying solely
on behavior data remains insufficient for detecting fraudulent deliveries. In this
paper, we present a Multimodal Fraudulent Delivery Detection framework (MFDD), which
integrates heterogeneous data from multiple agents (courier-side and user-side)-including
couriers' physical behavior, digital behavior, and conversations containing customer
feedback-for detecting fraudulent deliveries in the last-mile delivery. We employ
attention mechanisms to extract features from each modality and use cross-modal fusion
to capture complex and varied relationships between multimodal data. To further mitigate
modality imbalance during training, we introduce a dynamic gradient-modulation strategy
that balances learning across all modalities. We implement and evaluate MFDD on real-world,
human-annotated data, achieving a 9.6% improvement in precision and a 5.8% increase
in accuracy over the state-of-the-art methods. We also deploy the model in the production
environment of JD Logistics, and results show that compared to existing methods, MFDD
improves accuracy by 15.3%, reducing estimated annual costs by over 18.5 million CNY.
Progressive Semantic Residual Quantization for Multimodal-Joint Interest Modeling
in Music Recommendation
- Shijia Wang
- Tianpei Ouyang
- Qiang Xiao
- Dongjing Wang
- Yintao Ren
- Songpei Xu
- Da Guo
- Chuanjiang Luo
In music recommendation systems, multimodal interest learning is pivotal, which allows
the model to capture nuanced preferences, including textual elements such as lyrics
and various musical attributes such as different instruments and melodies. Recently,
methods that incorporate multimodal content features through semantic IDs have achieved
promising results. However, existing methods suffer from two critical limitations:
1) intra-modal semantic degradation, where residual-based quantization processes gradually
decouple discrete IDs from original content semantics, leading to semantic drift;
and 2) inter-modal modeling gaps, where traditional fusion strategies either overlook
modal-specific details or fail to capture cross-modal correlations, hindering comprehensive
user interest modeling. To address these challenges, we propose a novel multimodal
recommendation framework with two stages. In the first stage, our Progressive Semantic
Residual Quantization (PSRQ) method generates modal-specific and modal-joint semantic
IDs by explicitly preserving the prefix semantic feature. In the second stage, to
model multimodal interest of users, a Multi-Codebook Cross-Attention (MCCA) network
is designed to enable the model to simultaneously capture modal-specific interests
and perceive cross-modal correlations. Extensive experiments on multiple real-world
datasets demonstrate that our framework outperforms state-of-the-art baselines. This
framework has been deployed on one of China's largest music streaming platforms, and
online A/B tests confirm significant improvements in commercial metrics, underscoring
its practical value for industrial-scale recommendation systems.
Retrieval-LTV: Fine-Grained Transfer Learning for Lifetime Value Estimation in Large-Scale
Industrial Retrieval
- Shirui Wang
- Shengbin Jia
- Tianyue Cao
- Shuo Yang
- Lei Jiang
- Qi He
- Lingling Yao
- Yang Xiang
In computational advertising, platforms are increasingly optimizing toward advertisers'
real assessment metrics to help achieve more reliable advertising performance. Consequently,
predicting customers' Lifetime Value (LTV) has become an essential component of the
advertising system, as it directly impacts the actual Return On Investment (ROI) of
advertisers. Recent research on LTV prediction primarily focuses on the ranking stage,
lacking consideration of the initial retrieval stage. This oversight may lead to the
inconsistency between retrieval and ranking, resulting in a loss of efficiency. Unlike
the LTV estimation in the ranking stage, the retrieval stage faces more severe data
sparsity and constraints inherent in online scoring. Incorporating rich data from
other domains can mitigate the sparsity while introducing the negative transfer issue.
To tackle these challenges, we introduce Retrieval-LTV, a two-tower retrieval model
for LTV prediction. This model employs a cooperative framework and incorporates a
fine-grained evaluation for each sample across each expert, thereby enhancing effective
selective learning from the source domain while mitigating the risk of negative transfer.
Additionally, we have designed a specialized representation transformation to obtain
the LTV-oriented score for online retrieval. Experiments on three real-world industrial
datasets demonstrate that Retrieval-LTV outperforms all the baselines, achieving superior
performance. An online A/B test further confirms the effectiveness of Retrieval-LTV,
increasing the overall LTV by 2.08%. As a result, Retrieval-LTV has now been fully
deployed in Tencent Ads.
You Only Evaluate Once: A Tree-based Rerank Method at Meituan
- Shuli Wang
- Yinqiu Huang
- Changhao Li
- Yuan Zhou
- Yonggang Liu
- Yongqiang Zhang
- Yinhua Zhu
- Haitao Wang
- Xingxing Wang
Reranking plays a crucial role in modern recommender systems by capturing the mutual
influences within the list. Due to the inherent challenges of combinatorial search
spaces, most methods adopt a two-stage search paradigm: a simple General Search Unit
(GSU) efficiently reduces the candidate space, and an Exact Search Unit (ESU) effectively
selects the optimal sequence. These methods essentially involve making trade-offs
between effectiveness and efficiency, while suffering from a severe inconsistency problem, that is, the GSU often misses high-value lists from ESU. To address this problem,
we propose YOLOR, a one-stage reranking method that removes the GSU while retaining
only the ESU. Specifically, YOLOR includes: (1) a Tree-based Context Extraction Module
(TCEM) that hierarchically aggregates multi-scale contextual features to achieve ''list-level
effectiveness'', and (2) a Context Cache Module (CCM) that enables efficient feature
reuse across candidate permutations to achieve ''permutation-level efficiency''. Extensive
experiments across public and industry datasets validate YOLOR's performance and we
have successfully deployed YOLOR on the Meituan food delivery platform.
FinSage: A Multi-aspect RAG System for Financial Filings Question Answering
- Xinyu Wang
- Jijun Chi
- Zhenghan Tai
- Tung Sum Thomas Kwok
- Hailin He
- Zhuhong Li
- Yuchen Hua
- Muzhi Li
- Peng Lu
- Suyucheng Wang
- Yihong Wu
- Huang Jerry
- Jingrui Tian
- Fengran Mo
- Yufei Cui
- Ling Zhou
Leveraging large language models in real-world settings often entails a need to utilize
domain-specific data and tools in order to follow the complex regulations that need
to be followed for acceptable use. Within financial sectors, modern enterprises increasingly
rely on Retrieval-Augmented Generation (RAG) systems to address complex information
retrieval in financial document workflows. However, existing solutions struggle to
account for the inherent heterogeneity of data (e.g., text, tables, diagrams) and
evolving complexity in financial filings, leading to compromised accuracy in critical
information extraction. We propose the FinSage framework as a solution, utilizing a multi-aspect RAG framework tailored for data
retrieval and summarization in multi-modal financial documents. øurmodel introduces
three innovative components: (1) a multi-modal pre-processing pipeline that unifies
diverse data formats and generates chunk-level metadata summaries, (2) a multi-path
sparse-dense retrieval system augmented with query expansion (HyDE) and metadata-aware
semantic search, and (3) a domain-specialized re-ranking module fine-tuned via Direct
Preference Optimization to prioritize ground-truth-related content. Extensive experiments
demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions derived from
surpasses the best baseline method on the FinanceBench question answering datasets
by 24.06% in accuracy. Moreover, FinSage has been successfully deployed as financial question-answering system in online meetings,
where it has already served more than 1,200 people. The implementation is publicly
available at https://github.com/simplew4y/finsage.
EduCraft: A System for Generating Pedagogical Lecture Scripts from Long-Context Multimodal
Presentations
- Yucheng Wang
- Jifan Yu
- Daniel Zhang-Li
- Joy Jia Yin Lim
- Shangqing Tu
- Haoxuan Li
- Zhiyuan Liu
- Huiqin Liu
- Lei Hou
- Juanzi Li
- Bin Xu
Educators face substantial workload pressures, with significant time invested in preparing
teaching materials. Generating high-quality lecture scripts from multimodal presentations
is a particularly demanding aspect of this preparation. This paper introduces EduCraft, a novel system designed to automate Lecture Script Generation (LSG), addressing key
difficulties such as comprehensive multimodal understanding, long-context coherence,
and instructional design efficacy. EduCraft features a modular architecture comprising:
(1) a Multimodal Input Processing pipeline for robust data extraction and association
from slides; (2) a core Lecture Script Generation Engine with instruction-guided VLM
and Caption+LLM workflows for pedagogical synthesis; (3) an optional Knowledge Augmentation
Module using Retrieval-Augmented Generation (RAG) for enhanced factual grounding;
and (4) a Model Integration and Deployment Interface supporting diverse AI models
and providing a deployable API. Extensive evaluations, including human assessments
and a new automated evaluation framework, demonstrate that EduCraft significantly
outperforms strong baselines and teacher-refined scripts in producing coherent, readable,
and pedagogically sound lecture scripts. By effectively tackling core LSG challenges,
EduCraft offers a practical, configurable solution to reduce educator workload and
enhance educational content creation. We open-source EduCraft at https://github.com/wyuc/EduCraft.
CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging
E-commerce Markets
- Yujing Wang
- Yiren Chen
- Huoran Li
- Chunxu Xu
- Yuchong Luo
- Xianghui Mao
- Cong Li
- Lun Du
- Chunyang Ma
- Qiqi Jiang
- Yin Wang
- Fan Gao
- Wenting Mo
- Pei Wen
- Shantanu Kumar
- Taejin Park
- Yiwei Song
- Vijay Rajaram
- Tao Cheng
- Sonu Durgia
- Pranam Kolari
As global e-commerce platforms continue to expand, companies are entering new markets
where they encounter cold-start challenges due to limited human labels and user behaviors.
In this paper, we share our experiences in Coupang to provide a competitive cold-start
performance of relevance matching for emerging e-commerce markets. Specifically, we
present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address
three challenges: (1) activating cross-lingual transfer learning abilities of LLMs
through machine translation tasks; (2) enhancing query understanding and incorporating
e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact
of training label errors through a multi-round self-distillation training strategy.
Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques,
resulting in successful real-world deployment and significant online gains, with a
45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
Audience-Aware and Self-Adaptive Multi-Interest Modeling for Sharing Rate Prediction
in Affiliate Marketing
- Zhe Wang
- Ziyu Guan
- Yujian Cao
- Yaming Yang
- Rui Wang
- Bin Tong
- Wei Zhao
- Hongbo Deng
Affiliate marketing, a component of modern digital marketing, leverages partnerships
among merchants, promoters, and consumers to enhance item visibility and drive sales.
Promoters act as critical intermediaries, sharing items with their communities to
promote items while earning commissions. Accurate prediction of the sharing rate of
promoters enables platforms to optimize recommendation performance, thereby improving
promotional efficiency. However, existing related methods are mainly designed for
consumer-oriented scenarios (C-end), and face significant limitations in modeling
the promoters (B-end), which are typically characterized by audience group attachment.
Specifically, three core challenges emerge: (1) how to organically integrate audience
preferences while maintaining promoter dominance, (2) how to accommodate promoters'
diverse interest scopes, and (3) how to capture the complex one-to-many relationships
between promoters and their audiences. For Challenge (1), we employ a dynamic routing
mechanism based on interest capsules to model the diverse interests of promoters,
where audience groups are used to optimize the interest routing via a novel dual-channel
attention mechanism, thus allowing audience groups to explicitly participate in the
promoter decision-making process with an auxiliary role. For Challenge (2), a parameter-free,
confidence-aware interest activation mechanism is introduced to adaptively select
sparse interest capsules. For Challenge (3), we pioneer the use of hypergraphs in
CTR prediction to model one-to-many relationships between promoters and audiences.
Extensive experiments are conducted on two real-world datasets to validate the effectiveness
of our approach. Furthermore, the model is deployed on the Alimama platform, which
hosts over 100,000 promoters. Online A/B testing results demonstrate that our method
achieves a 5.31% average improvement over online baselines.
Dynamic Network-Based Two-Stage Time Series Forecasting for Affiliate Marketing
- Zhe Wang
- Yaming Yang
- Ziyu Guan
- Bin Tong
- Rui Wang
- Wei Zhao
- Hongbo Deng
In recent years, affiliate marketing has emerged as a revenue-sharing strategy where
merchants collaborate with promoters to promote their products. It not only increases
product exposure but also allows promoters to earn a commission. This paper addresses
the pivotal yet under-explored challenge in affiliate marketing: accurately assessing
and predicting the contributions of promoters in product promotion. We design a novel
metric for evaluating the indirect contributions of the promoter, called propagation
scale. Unfortunately, existing time series forecasting techniques fail to deliver
accurate predictions due to the propagation scale being influenced by multiple factors
and the inherent complexities arising from dynamic scenarios. To address this issue,
we decouple the network structure from the node signals and propose a two-stage solution:
initially, the basic self-sales and network structure prediction are conducted separately,
followed by the synthesis of the propagation scale. Specifically, we design a graph
convolution encoding scheme based on descendant neighbors and incorporate hypergraph
convolution to efficiently capture complex promotional dynamics. Additionally, three
auxiliary tasks are employed: self-sales prediction for base estimations, descendant
prediction to synthesize propagation scale, and promoter activation prediction to
mitigate high volatility issues. Extensive offline experiments on large-scale industrial
datasets validate the superiority of our method. We further deploy our model on Alimama
platform with over 100,000 promoters, achieving a 9.29% improvement in GMV and a 5.89%
increase in sales volume.
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale
Web Search
- Zeyu Xiong
- Yixuan Nan
- Li Gao
- Hengzhu Tang
- Shuaiqiang Wang
- Junfeng Wang
- Dawei Yin
In the dynamic landscape of large-scale web search, Query-Driven Text Summarization
(QDTS) aims to generate concise and informative summaries from textual documents based
on a given query, which is essential for improving user engagement and facilitating
rapid decision-making. Traditional extractive summarization models, based primarily
on ranking candidate summary segments, have been the dominant approach in industrial
applications. However, these approaches suffer from two key limitations: 1) The multi-stage
pipeline often introduces cumulative information loss and architectural bottlenecks
due to its weakest component; 2) Traditional models lack sufficient semantic understanding
of both user queries and documents, particularly when dealing with complex search
intents. In this study, we propose a novel framework to pioneer the application of
generative models to address real-time QDTS in industrial web search. Our approach
integrates large model distillation, supervised fine-tuning, direct preference optimization,
and lookahead decoding to transform a lightweight model with only 0.1B parameters
into a domain-specialized QDTS expert. Evaluated on multiple industry-relevant metrics,
our model outperforms the production baseline and achieves a new state of the art.
Furthermore, it demonstrates excellent deployment efficiency, requiring only 334 NVIDIA
L20 GPUs to handle ~50,000 queries per second under 55~ms average latency per query.
Climber: Toward Efficient Scaling Laws for Large Recommendation Models
- Songpei Xu
- Shijia Wang
- Da Guo
- Xianwen Guo
- Qiang Xiao
- Bin Huang
- Guanlin Wu
- Chuanjiang Luo
Transformer-based generative models have achieved remarkable success across domains
with various scaling law manifestations. However, our extensive experiments reveal
persistent challenges when applying Transformer to recommendation systems: (1) Transformer
scaling is not ideal with increased computational resources, due to structural incompatibilities
with recommendation-specific features such as multi-source data heterogeneity; (2)
critical online inference latency constraints (tens of milliseconds) that intensify
with longer user behavior sequences and growing computational demands. We propose
Climber, an efficient recommendation framework comprising two synergistic components:
the model architecture for efficient scaling and the co-designed acceleration techniques.
Our proposed model adopts two core innovations: (1) multi-scale sequence extraction
that achieves a time complexity reduction by a constant factor, enabling more efficient
scaling with sequence length; (2) dynamic temperature modulation adapting attention
distributions to the multi-scenario and multi-behavior patterns. Complemented by acceleration
techniques, Climber achieves a 5.15× throughput gain without performance degradation
by adopting a ''single user, multiple item'' batched processing and memory-efficient
Key-Value caching.
Comprehensive offline experiments on multiple datasets validate that Climber exhibits
a more ideal scaling curve. To our knowledge, this is the first publicly documented
framework where controlled model scaling drives continuous online metric growth (12.19%
overall lift) without prohibitive resource costs. Climber has been successfully deployed
on Netease Cloud Music, one of China's largest music streaming platforms, serving
tens of millions of users daily.
HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems
- Haoqiang Yang
- Congde Yuan
- Kun Bai
- Mengzhuo Guo
- Wei Yang
- Chao Zhou
Online display advertising platforms rely on pre-ranking systems to efficiently filter
and prioritize candidate ads from large corpora, balancing relevance to users with
strict computational constraints. The prevailing two-tower architecture, though highly
efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction
and coarse similarity metrics, undermining its capacity to model complex user-ad relationships.
In this study, we propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model,
a new architecture that augments the two-tower paradigm with two key components: generators that pre-generate holistic vectors incorporating coarse-grained user-ad interactions
through a dual-generator framework with a cosine-similarity-based generation loss
as the training objective, and multi-head representers that project embeddings into multiple latent subspaces to capture fine-grained, multi-faceted
user interests and multi-dimensional ad attributes. This design enhances modeling
effectiveness without compromising inference efficiency. Extensive experiments on
public datasets and large-scale online A/B testing on Tencent's advertising platform
demonstrate that HIT significantly outperforms several baselines in relevance metrics,
yielding a 1.66% increase in Gross Merchandise Volume and a 1.55% improvement in Return
on Investment, alongside similar serving latency to the vanilla two-tower models.
The HIT model has been successfully deployed in Tencent's online display advertising
system, serving billions of impressions daily. The code is available at https://github.com/HarveyYang123/HIT_model.
CheckDAPR: An MLLM-based Sketch Analysis System for Draw-A-Person-in-the-Rain Assessments
- Migyeong Yang
- Chaehee Park
- Taeeun Kim
- Hayeon Song
- Jinyoung Han
Sketch-based drawing assessments in art therapy are commonly used to understand the
cognitive and psychological states of individuals. In conjunction with self-report
measures, drawing assessments serve to enhance insights into an individual's psychological
state. However, interpreting the drawing assessments is labor-intensive and substantially
reliant on the experience of the art therapists. While a few automated approaches
for analyzing drawing-based assessments have been proposed to remedy this issue, they
mostly rely on existing object detection methods, where complex drawing attributes
cannot be accurately decoded. To overcome these challenges, we propose a novel and
comprehensive Draw-A-Person-in-the-Rain (DAPR) analysis system, CheckDAPR, which utilizes a Multimodal Large Language Model (MLLM) with object detection methods
for in-depth evaluation. Our experimental results show the promising performance of
CheckDAPR and its ability to reduce analysis time for art therapists, indicating its potential
to aid professionals in art therapy.
DAS: Dual-Aligned Semantic IDs Empowered Industrial Recommender System
- Wencai Ye
- Mingjie Sun
- Shaoyun Shi
- Peng Wang
- Wenjin Wu
- Peng Jiang
Semantic IDs are discrete identifiers generated by quantizing the Multi-modal Large
Language Models embeddings, enabling efficient multi-modal content integration in
recommendation systems. However, their lack of collaborative signals results in a
misalignment with downstream discriminative and generative recommendation objectives.
Recent studies have introduced various alignment mechanisms to address this problem,
but their two-stage framework design still leads to two main limitations: (1) inevitable
information loss during alignment, and (2) inflexibility in applying adaptive alignment
strategies, consequently constraining the mutual information maximization during the
alignment process.
To address these limitations, we propose a novel and flexible one-stage Dual-Aligned
Semantic IDs (DAS) method that simultaneously optimizes quantization and alignment,
preserving semantic integrity and alignment quality while avoiding the information
loss typically associated with two-stage methods. Meanwhile, DAS achieves more efficient
alignment between the semantic IDs and collaborative signals, with the following two
innovative and effective approaches: (1) Multi-view Constrative Alignment: To maximize
mutual information between semantic IDs and collaborative signals, we first incorporate
an ID-based CF debias module, and then design three effective contrastive alignment
methods: dual user-to-item (u2i), dual item-to-item/user-to-user (i2i/u2u), and dual
co-occurrence item-to-item/user-to-user (i2i/u2u). (2) Dual Learning: By aligning
the dual quantizations of users and ads, the constructed semantic IDs for users and
ads achieve stronger alignment. Finally, offline experiments and online A/B tests
confirm DAS's efficacy, now serving 400M+ users in Kuaishou's ad platform.
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
- Zhichen Zeng
- Xiaolong Liu
- Mengyue Hang
- Xiaoyi Liu
- Qinghai Zhou
- Chaofei Yang
- Yiqun Liu
- Yichen Ruan
- Laming Chen
- Yuxin Chen
- Yujia Hao
- Jiaqi Xu
- Jade Nie
- Xi Liu
- Buyun Zhang
- Wei Wen
- Siyang Yuan
- Hang Yin
- Xin Zhang
- Kai Wang
- Wen-Yen Chen
- Yiping Han
- Huayu Li
- Chunzhi Yang
- Bo Long
- Philip S. Yu
- Hanghang Tong
- Jiyan Yang
Click-through rate (CTR) prediction, which predicts the probability of a user clicking
an ad, is a fundamental task in recommender systems. The emergence of heterogeneous
information, such as user profile and behavior sequences, depicts user interests from
different aspects. A mutually beneficial integration of heterogeneous information
is the cornerstone towards the success of CTR prediction. However, most of the existing
methods suffer from two fundamental limitations, including (1) insufficient inter-mode interaction due to the unidirectional information flow between modes, and (2) aggressive information aggregation caused by early summarization, resulting in excessive information loss. To address
these limitations, we propose a novel module named InterFormer to learn heterogeneous
information interaction in an interleaving style. To achieve better interaction learning, InterFormer enables bidirectional information flow for mutually beneficial learning across different
modes. To avoid aggressive information aggregation, we retain complete information
in each data mode and use a separate Cross Arch for effective information selection
and summarization. InterFormer has been deployed across multiple platforms at Meta Ads, achieving 0.15% performance
gain and 24% QPS gain compared to prior state-of-the-art models and yielding sizable
online impact.
Meta-Adaptive Network for Effective Cold-Start Recommendation via Warm-Aware Representation
Learning
- Ao Zhang
- Boya Du
- Yulin Xu
- Jialin Zhu
- Yuning Jiang
Click-Through Rate (CTR) prediction models enable users to discover matched items
in recommender systems. Industrial-scale models typically adopt a unified embedding
approach for both hot and cold items. However, existing embedding-based models exhibit
limitations in representation learning for cold items due to sparse historical user
interactions. In this paper, we propose a Meta-Adaptive Network for Effective Cold-Start Recommendation (MANE). Inspired by meta-learning, we develop a lightweight plug-and-play meta-learner
that generates enhanced representations to model full-lifecycle representations for
cold items. Our meta-network dynamically adjusts the contribution of generalized features
in final representations as item exposure increases, enabling adaptive balancing between
generalization and specificity for cold items. In addition, certain high-potential
items in cold-start scenarios face challenges in effective exposure due to limited
interaction signals. Therefore, we further propose a novel representation learning
method that incorporates a warm-aware contrastive loss, which aligns the representations
of cold items with those of hot items exhibiting high multimodal similarity. Experimental
results on the Taobao production dataset and online A/B testing validate the effectiveness
of our method, achieving 4.34% item page views (IPV) and 2.84% CTR improvement.
Edge-Variational Graph Neural Networks: Harnessing Weak Ties for Enhanced Default
Risk Prediction
- Feng Zhang
- Jianfeng Chi
- Rui Ma
- Gang Chen
- Rongqi Chen
Default risk prediction (DRP) leveraging financial relational networks (FRNs) has
seen extensive application in recent years. Connection confidence within an FRN is
pivotal in ensuring the efficacy of FRN-based DRP, with strong connections offering
a more substantial predictive impact and weak connections providing supplementary
predictive insights. However, in practical DRP implementations, the confidence in
both strong and weak connections may be compromised due to fraudulent activities and
misinformation caused by data collection biases, necessitating a novel method to effectively
assess and calibrate low-confidence connections within the FRN to enhance DRP. To
address this challenge, we propose a novel method named Edge-Variational Graph Neural
Networks (EVGNN). During the graph encoding phase, EVGNN employs variational inference
to assess connection confidence within the FRN, eliminating low-confidence connections
and reinforcing weak connections with significant predictive value. In the decoding
phase, EVGNN recodes network nodes based on the calibrated FRN structure, yielding
refined node representations that bolster DRP. Empirical evaluation based on public
datesets and a large-scale FRN dataset from a real-world DRP scenario validates the
effectiveness of the proposed method in identifying connection confidence and affirms
the validity of utilizing calibrated FRN to enhance DRP.
Augmenting Guest Search Results with Recommendations at Airbnb
- Haowei Zhang
- Philbert Lin
- Dishant Ailawadi
- Soumyadip Banerjee
- Shashank Dabriwal
- Hao Li
- Kedar Bellare
- Liwei He
- Sanjeev Katariya
Users on Airbnb often perform exhaustive searches with varying conditions to find
suitable accommodations. However, overly narrow search criteria can lead to insufficient
results, causing frustration and abandonment of search journeys. To address these
challenges, we developed flexible pivot recommendations that dynamically augment search
results by suggesting alternative dates, relaxing amenity requirements, or adjusting
price constraints. These recommendations align with users' broader travel intent,
resulting in a measurable improvement in booking rates on the platform.
Our solution introduces two key innovations: (1) a modular and extensible architecture
to generate the flexible pivot recommendations that integrates seamlessly with Airbnb's
existing search ranking system, enabling rapid iteration and minimizing maintenance
overhead; and (2) an efficient approach leveraging transfer learning and a Mixture
of Experts (MoE) architecture to rank recommendations alongside organic search results.
This approach handles diverse scenarios, from single to multiple recommendations,
while addressing cold-start challenges and supporting ongoing enhancements. Our solution's
scalability and generalizability make it applicable to industries such as online travel
agencies and e-commerce platforms, where users benefit from more diverse, intent-aligned
recommendations.
Maps Ranking Optimization in Airbnb
- Hongwei Zhang
- Malay Haldar
- Kedar Bellare
- Sherry Chen
- Soumyadip Banerjee
- Xiaotang Wang
- Mustafa Abdool
- Huiji Gao
- Pavan Tapadia
- Liwei He
- Sanjeev Katariya
- Stephanie Moyerman
Search results on Airbnb are presented in two user interfaces: a list of rectangle
cards, referred to as the feed result, that include photos, prices, ratings, and other
details of the listings, and listing pins on the map, referred to as the map result,
which can either display as price pins or appear without prices as mini-pins. The
map plays a key role in Airbnb. Not only does it display the location of the listings
in the search results, but it also serves as an interactive user interface that allows
users to view the details of the pins and perform new searches by moving or zooming
on the map. Majority of searches on Airbnb are conducted using the map, and majority
of bookings are related to listings shown in map search results. Limited research
has been conducted within the industry to address the unique challenges of maps ranking.
For years, it was assumed that showing the top K results based on the model designed
for feed ranking was also optimal for the map. However, this assumption simply breaks
down when we take a closer look at the NDCG (Normalized Discounted Cumulative Gain)
metric. Attention is key to NDCG, and the attention flow on the feed does not apply
to the map. In this paper, we will begin with the NDCG theory and redesign map-specific
NDCG by introducing three types of attention factors. We conducted a series of experiments
to test whether optimizing map NDCG could drive the booking rates on Airbnb, and the
results strongly supported this hypothesis.
D3-TR: Data-driven Daily Delivery Task Rescheduling for Cost-effective Last-mile Delivery
- Lidi Zhang
- Yinfeng Xiang
- Wenjun Lyu
- Zhiqing Hong
- Haotian Wang
- Desheng Zhang
- Yunhuai Liu
- Tian He
In last-mile logistics, couriers are typically assigned fixed zones to perform door-to-door
deliveries. In practice, packages in some delivery zones might not be fulfilled on
time due to couriers taking irregular leave for sickness or higher-priority task assignments,
e.g., services for VIPs and regulatory training. Beyond the costly real-world practice,
i.e., hiring temporary workers, analysis of historical data reveals that a daily delivery
task rescheduling among on-duty couriers can be a cost-effective and efficient alternative.
It involves individual workload assessments and delivery task assignments, both of
which existing methods can not address adequately: (i) Existing courier workload assessment
methods are not tailored for downstream optimization tasks, leading to poor performance.
(ii) Efficiency-oriented task assignment methods may lead to unfair workload among
the couriers. To address the above two limitations, in this paper, we propose D3-TR, a data-driven method for task reassignment among present couriers. Firstly, we
design a consistency-guided predictor that can quickly and precisely predict the workload
of couriers. Secondly, based on this predictor, we design a workload-aware genetic
algorithm to solve the optimal task allocation problem. Experimental results underscore
the superiority of our method over several baselines. Furthermore, real-world deployment
on millions of orders demonstrates the effectiveness of our solution, yielding an
average of 3.9% improvement in the on-time delivery rate.
Billion-Scale Graph Deep Learning Framework for Ads Recommendation
- Si Zhang
- Weilin Cong
- Dongqi Fu
- Andrey Malevich
- Hao Wu
- Baichuan Yuan
- Xin Zhou
- Kaveh Hassani
- Zhigang Hua
- Austin Derrow-Pinion
- Yan Xie
- Xuewei Wang
- Yinglong Xia
- Ning Yao
- Vena Li
- Sem Park
- Bo Long
In this paper, we systemically disentangle BHG, a graph deep learning framework for daily users' ads recommendations. BHG mainly
relies on two pillars: (1) graph tokenization to convert the input temporal heterogeneous
graph into sequences of tokens, and (2) graph MLP-Mixer neural architecture to learn
node representations on sequences of tokens via a mini-batch manner. In general, BHG
embraces three advantages: (1) flexibility, i.e., BHG can be seamlessly integrated
with any existing industrial recommendation model by treating the learned node embeddings
as additional features that encode interactions, (2) efficiency, i.e., the graph tokenization
allows sampling the neighborhood both locally and globally, and reduces the number
of nodes considered for aggregations, and (3) model simplicity, i.e., the graph MLP-Mixer
does not require self-attention for aggregating nodes and hence enjoys the simplicity.
We demonstrate the superior performance of the proposed BHG on two internal datasets
and one public dataset. We hope this paper can share insights and explain large-scale
graph deep learning deployments for researchers, engineers, and practitioners.
FLAIR: Feedback Learning for Adaptive Information Retrieval
- William Zhang
- Yiwen Zhu
- Yunlei Lu
- Mathieu Demarne
- Wenjing Wang
- Kai Deng
- Nutan Sahoo
- Katherine Lin
- Miso Cilimdzic
- Subru Krishnan
Recent advances in Large Language Models (LLMs) have driven the adoption of copilots
in complex technical scenarios, underscoring the growing need for specialized information
retrieval solutions. In this paper, we introduce FLAIR, a lightweight, feedback learning
framework that adapts copilot systems' retrieval strategies by integrating domain-specific
expert feedback. FLAIR operates in two stages: an offline phase obtains indicators
from (1) user feedback and (2) questions synthesized from documentation, storing these
indicators in a decentralized manner. An online phase then employs a two-track ranking
mechanism to combine raw similarity scores with the collected indicators. This iterative
setup refines retrieval performance for any query. Extensive real-world evaluations
of FLAIR demonstrate significant performance gains on both previously seen and unseen
queries, surpassing state-of-the-art approaches. The system has been successfully
integrated into Copilot DECO, serving thousands of users at Microsoft, demonstrating
its scalability and effectiveness in operational environments.
Towards Unbiased and Real-Time Staytime Prediction for Live Streaming Recommendation
- Haiyuan Zhao
- Changshuo Zhang
- Yang Wang
- Hao Wang
- Zhen Ouyang
- Bin Yuan
- Qinglei Wang
- Zuotao Liu
Live streaming has emerged as a dynamic content format that delivers real-time and
interactive experiences to users. Distinguished by the short lifespan and immersive
nature of live rooms, live streaming poses two key challenges for recommendation:
(1) Timeliness: the model must rapidly identify and promote relevant live rooms to
target users within a limited window; and (2) Accurate staytime prediction: since
extended watching often reflects content quality and user satisfaction, precisely
predicting staytime serves as a critical indicator of recommendation relevance and
user engagement. Existing approaches often improve timeliness by repeatedly sending
staytime signals to accelerate model learning. However, this introduces label truncation
bias, distorting the unbiased estimation of high staytime samples. To reconcile these
competing demands, we propose MS3M (Multi-Stream Segmented Staytime Modeling), a novel framework that leverages multiple data streams for faster learning
while employing segmented staytime modeling-converting staytime regression into a
series of time-segmented classification tasks to ensure unbiased training. Furthermore,
to address the sparsity of high staytime samples, MS3M's task-dependent architecture
allows high staytime parameters to leverage prior knowledge from low staytime data,
significantly improving generalization for long-duration watching behaviors. Extensive
offline experiments and online A/B tests on TikTok confirm that MS3M effectively balances
timeliness and unbiased learning, leading to substantial gains in recommendation accuracy.
The proposed approach currently serves TikTok's live streaming recommendation system,
contributing to continuous improvement in user watching experience.
Personalized Multi Modal Alignment Encoding for CTR-Recommendation in WeChat
- Jiawei Zheng
- Hao Gu
- Lingling Yi
- Jie Wen
- Chuan Chen
In recent years, with the significant evolution of multi-modal large models, many
recommender researchers realized the potential of multi-modal information for user
interest modeling. In industry recommendation system, a wide-used modeling architecture
is to first pre-train a multi-modal model to provide omnipotent representations and
then encode to discrete semantic IDs for online model. Although such a paradigm achieves
remarkable improvements, there still exist two problems that limit model performance:
(1) Modalities Mapping Independence: Each modal representation is independently mapped
to semantic spaces and then get the specific code, which ignores the consistency and
complementarity of different modalities of the same item. (2) User-irrelevant Clustering
Assignment: For the specific item, most of existing quantization methods assume that
all users share the same cluster assignments, failing to account for the varying interpretations
and emotional responses users may have toward an item.
To address these challenges, we propose a Personalized Multi Modal Alignment Encoding
for CTR-Recommendation in WeChat (PMMAE for short). First, we design a multi modal
contrastive alignment module to ensure the consistency of various modalities encoding.
We then fuse them to form a consistent and comprehensive semantic label. Second, a
meta network fed with users' interest embeddings is learned to generate personalized
functions to achieve personalized clustering assignment for each user. Benefiting
from the meta-generated personalized assignment function, we can take full account
of the variability in users' understanding of items.Extensive experimental results
demonstrate that our model PMMAE significantly outperforms baseline models on both
offline performance and online A/B tests in WeChat recommendation scenario. Our model
has been deployed on WeChat's various services, serving hundreds of millions of users
daily.
RankMixer: Scaling Up Ranking Models in Industrial Recommenders
- Jie Zhu
- Zhifang Fan
- Xiaoxie Zhu
- Yuchen Jiang
- Hangyu Wang
- Xintian Han
- Haoran Ding
- Xinmin Wang
- Wenlin Zhao
- Zhen Gong
- Huizhi Yang
- Zheng Chai
- Zhe Chen
- Yuchao Zheng
- Qiwei Chen
- Feng Zhang
- Xun Zhou
- Peng Xu
- Xiao Yang
- Di Wu
- Zuotao Liu
Recent progress on large language models (LLMs) has spurred interest in scaling up
recommendation systems, yet two practical obstacles remain. First, training and serving
cost on industrial Recommenders must respect strict latency bounds and high QPS demands.
Second, most human-designed feature-crossing modules in ranking models were inherited
from the CPU era and fail to exploit modern GPUs, resulting in low Model Flops Utilization
(MFU) and poor scalability. We introduce RankMixer, a hardware-aware model design
tailored towards a unified and scalable feature-interaction architecture. RankMixer
retains the transformer's high parallelism while replacing quadratic self-attention
with multi-head token mixing module for higher efficiency. Besides, RankMixer maintains
both the modeling for distinct feature subspaces and cross-feature-space interactions
with Per-token FFNs. We further extend it to one billion parameters with a Sparse-MoE
variant for higher ROI. A dynamic routing strategy is adapted to address the inadequacy
and imbalance of experts training. Experiments show RankMixer's superior scaling abilities
on a trillion-scale production dataset. By replacing previously diverse handcrafted
low-MFU modules with RankMixer, we boost the model MFU from 4.5% to 45%, and scale our online ranking model parameters by two orders of magnitude while maintaining
roughly the same inference latency. We verify RankMixer's universality with online
A/B tests across two core application scenarios (Recommendation and Advertisement).
Finally, we launch 1B Dense-Parameters RankMixer for full traffic serving without increasing the serving
cost, which improves user active days by 0.3% and total in-app usage duration by 1.08%.
SESSION: Resource Papers
IARD: Intruder Activity Recognition Dataset for Threat Detection
- Shehzad Ali
- Md Tanvir Islam
- Ik Hyun Lee
- Saeed Anwar
- Javier Del Ser
- Khan Muhammad
Home security and surveillance systems are rapidly evolving, with Artificial Intelligence
(AI) playing a transformative role in enhancing safety and threat detection. While
several AI methods and datasets for intruder-related risk assessment exist, they predominantly
focus on face detection and recognition, leaving a significant gap in addressing high-risk
scenarios involving malicious intent, such as theft or harm. The lack of dedicated
datasets for recognizing complex intruder activities, such as carrying weapons or
engaging in destructive actions like kicking doors or breaking locks, limits the development
of robust solutions. This work bridges this gap by introducing the Intruder Activity
Recognition Dataset (IARD), a video dataset specifically designed to recognize four
critical intruder activities: Armed Intruder, Door Kick, Intruder Inside and Lock Breaking. Leveraging IARD, we thoroughly benchmark various state-of-the-art methods, among
which a Vision Transformer is found to achieve an impressive 93.3% accuracy in recognizing
intruder actions. Our contribution highlights the potential of IARD in advancing AI-driven
surveillance systems, providing a foundational dataset and benchmark for recognizing
complex intruder activities.
ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
- Eliseo Bao
- Anxo Perez
- Javier Parapar
Depression is a pervasive mental health condition that affects hundreds of millions
of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional
clinical access and pervasive stigma. Social media platforms, and Reddit in particular,
offer rich, user-generated narratives that can reveal early signs of depressive symptomatology.
However, existing computational approaches often label entire posts simply as depressed
or not depressed, without linking language to specific criteria from the DSM-5, the
standard clinical framework for diagnosing depression. This limits both clinical relevance
and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus
comprising 1484 long-form posts, each exhaustively annotated at the sentence level
by a licensed psychologist for the nine DSM-5 depression symptoms. For each label,
the annotator also provides a concise clinical rationale grounded in DSM-5 methodology.
We conduct an exploratory analysis of the collection, examining lexical, syntactic,
and emotional patterns that characterize symptom expression in social media narratives.
Compared to prior resources, ReDSM5 uniquely combines symptom-specific supervision
with expert explanations, facilitating the development of models that not only detect
depression but also generate human-interpretable reasoning. We establish baseline
benchmarks for both multi-label symptom classification and explanation generation,
providing reference results for future research on detection and interpretability.
hopwise: A Python Library for Explainable Recommendation based on Path Reasoning over
Knowledge Graphs
- Ludovico Boratto
- Gianni Fenu
- Mirko Marras
- Giacomo Medda
- Alessandro Soccol
Explainability is becoming central to the development of responsible recommender systems,
especially as path reasoning over knowledge graphs saw increased adoption for extracting
structured, semantic user-item connections. However, reproducible research in such
field remains limited due to fragmented implementations, missing utilities, and the
lack of standardized evaluation pipelines. In this paper, we propose hopwise, an open-source
library that supports the full life-cycle of explainable path reasoning recommendation
methods over knowledge graphs, from knowledge graph preparation to explanation path
delivery and evaluation. Rather than creating a new library from scratch, hopwise
builds upon the modular and widely adopted RecBole ecosystem, enriching it with more
knowledge graphs, path sampling utilities, path reasoning methods, and metrics for
evaluating explanation path utility, coverage, and diversity. We show the framework's
utility by means of a benchmark including two knowledge graphs and several recommendation
methods. Code and Data: https://github.com/tail-unica/hopwise.
PyLate: Flexible Training and Retrieval for Late Interaction Models
- Antoine Chaffin
- Raphaël Sourty
Neural ranking has become a cornerstone of modern information retrieval. While single
vector search remains the dominant paradigm, it suffers from the shortcoming of compressing
all the information into a single vector. This compression leads to notable performance
degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks.
Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving
individual token embeddings and computing similarity via the MaxSim operator. This
architecture has demonstrated superior empirical advantages, including enhanced out-of-domain
generalization, long-context handling, and performance in complex retrieval scenarios.
Despite these compelling empirical results and clear theoretical advantages, the practical
adoption and public availability of late interaction models remain low compared to
their single-vector counterparts, primarily due to a lack of accessible and modular
tools for training and experimenting with such models. To bridge this gap, we introduce
PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector
architectures natively, inheriting its efficient training, advanced logging, and automated
model card generation while requiring minimal code changes to code templates users
are already familiar with. By offering multi-vector-specific features such as efficient
indexes, PyLate aims to accelerate research and real-world application of late interaction
models, thereby unlocking their full potential in modern IR systems. Finally, PyLate
has already enabled the development of state-of-the-art models, including GTE-ModernColBERT
and Reason-ModernColBERT, demonstrating its practical utility for both research and
production environments.
VideoAVE: A Multi-Attribute Video-to-Text Attribute Value Extraction Dataset and Benchmark
Models
- Ming Cheng
- Tong Wu
- Jiazhen Hu
- Jiaying Gong
- Hoda Eldardiry
Attribute Value Extraction (AVE) is important for structuring product information
in e-commerce. However, existing AVE datasets are primarily limited to text-to-text
or image-to-text settings, lacking support for product videos, diverse attribute coverage,
and public availability. To address these gaps, we introduce VideoAVE, the first publicly
available video-to-text e-commerce AVE dataset across 14 different domains and covering
172 unique attributes. To ensure data quality, we propose a post-hoc CLIP-based Mixture
of Experts filtering system (CLIP-MoE) to remove the mismatched video-product pairs,
resulting in a refined dataset of 224k training data and 25k evaluation data. In order
to evaluate the usability of the dataset, we further establish a comprehensive benchmark
by evaluating several state-of-the-art video vision language models (VLMs) under both
attribute-conditioned value prediction and open attribute-value pair extraction tasks.
Our results analysis reveals that video-to-text AVE remains a challenging problem,
particularly in open settings, and there is still room for developing more advanced
VLMs capable of leveraging effective temporal information. The dataset and benchmark
code for VideoAVE are available at: https://github.com/gjiaying/VideoAVE.
ERASURE: A Modular and Extensible Framework for Machine Unlearning
- Andrea D'Angelo
- Claudio Savelli
- Gabriele Tagliente
- Flavio Giobergia
- Elena Baralis
- Giovanni Stilo
Machine Unlearning (MU) is an emerging research area that enables models to selectively
forget specific data, a critical requirement for privacy compliance (e.g., GDPR, CCPA)
and security. However, the lack of standardized benchmarks makes evaluating and developing
unlearning methods difficult. To address this gap, we introduce ERASURE, a benchmarking
and development framework designed to systematically assess MU techniques. ERASURE
provides a modular, extensible, open-source environment with real-world datasets and
standardized unlearning measures. The framework is designed with configuration-driven
workflows and an inversion of control architecture, allowing integration of new datasets,
models, and evaluation measures. ERASURE advances trustworthy AI research as a tool
for researchers to develop and benchmark new MU methods.
YTCommentVerse: A Multi-Category Multi-Lingual YouTube Comment Corpus
- Hridoy Sankar Dutta
- Biswadeep Khan
In this paper, we introduce YTCommentVerse, a large-scale multilingual and multi-category
dataset of YouTube comments. It contains over 32 million comments from 178,000 videos
contributed by more than 20 million unique users spanning 15 distinct YouTube content
categories such as Music, News, Education and Entertainment. Each comment in the dataset
includes video and comment IDs, user channel details, upvotes and category labels.
With comments in over 50 languages, YTCommentVerse provides a rich resource for exploring
sentiment, toxicity and engagement patterns across diverse cultural and topical contexts.
This dataset helps fill a major gap in publicly available social media datasets particularly
for analyzing video sharing platforms by combining multiple languages, detailed categories
and other metadata.
Internet of Things Dataset for Human Operator Activity Recognition in Industrial Environment
- Abdur Forkan
- Prem Prakash Jayaraman
- Clarence Antonmeryl
- Federico Montori
- Abhik Banerjee
- Kaneez Fizza
- Dimitrios Georgakopoulos
In industrial environments, most production-related activities performed by human
operators are often complex. Accurate detections of these activities are pivotal as
it can greatly help to assess productivity that can lead to improvement in worker
training, as well as in other scenarios ensure a safe work environment and reducing
injuries. Existing datasets on wearable Internet of Things (IoT) for human activity
recognition primarily focuses on general activities, such as walking, running, etc.,
and therefore, related machine learning models and datasets are not suitable for application
to industrial environments. In this paper, we present a novel dataset for classifying
human operator activities in a meat processing plant where production line operators
use knives to cut, process and produce meat products. Our dataset contains human operator
activity data captured using wearable IoT sensors collected from a meat processing
production facility. Through extensive experiments using machine and deep learning,
we demonstrate that our dataset is effective and useful for detecting different activities
of a human operator working in an industrial environment. To the best of our knowledge,
this is the only real-world IoT dataset that will be made publicly available to support
further research into industrial activities recognition. Our dataset and related experiments
are available at https://digitalinnovationlab.github.io/mppdataset.
Portuguese post-OCR Resources for Text Optimisation
- Tomás Freitas Osório
- Henrique Lopes Cardoso
Optical Character Recognition (OCR) systems are designed to extract text from images.
While typically optimised for modern documents, they often struggle when applied to
historical documents due to older fonts, complex layouts, and physical degradation,
which can result in noisy outputs. To reduce OCR errors, post-OCR algorithms are commonly
used, however, their development and evaluation require image-transcription pairs.
Compared to other European languages, there is a lack of transcribed documents for
historical Portuguese, especially for texts predating the 19th century. To address
this gap, we introduce Portuguese post-OCR Resources for Text Optimisation (PORTO), a dataset that spans from the 17th to the 20th centuries. PORTO
contains 3,782 image-transcription pairs, along with OCR outputs from four different
systems, providing a valuable resource for the development and evaluation of OCR and
post-OCR methods tailored to historical Portuguese.
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method
- Dongqi Fu
- Yada Zhu
- Zhining Liu
- Lecheng Zheng
- Xiao Lin
- Zihao Li
- Liri Fang
- Katherine Tieu
- Onkar Bhardwaj
- Kommy Weldemariam
- Hanghang Tong
- Hendrik Hamann
- Jingrui He
Climate science studies the structure and dynamics of Earth's climate system and seeks
to understand how climate changes over time, where the data is usually stored in the
format of time series, recording the climate features, geolocation, time attributes,
etc. Recently, much research attention has been paid to the climate benchmarks. In
addition to the most common task of weather forecasting, several pioneering benchmark
works are proposed for extending the modality, such as domain-specific applications
like tropical cyclone intensity prediction and flash flood damage estimation, or climate
statement and confidence level in the format of natural language. To further motivate
the artificial intelligence development for climate science, in this paper, we first
contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns (1) the time series climate data from ERA5, (2) extreme weather events
data from NOAA, and (3) satellite image data from NASA HLS based on a unified spatial-temporal
granularity. Second, under each data modality, we also propose a simple but strong
generative method that could produce competitive performance in weather forecasting,
thunderstorm alerts, and crop segmentation tasks in the proposed ClimateBench-M. The
data and code of ClimateBench-M are publicly available at https://github.com/iDEA-iSAIL-Lab-UIUC/ClimateBench-M.
FediData: A Comprehensive Multi-Modal Fediverse Dataset from Mastodon
- Min Gao
- Haoran Du
- Wen Wen
- Qiang Duan
- Xin Wang
- Yang Chen
Recently, decentralized online social networks (DOSNs) such as Mastodon have emerged
quickly, bringing new opportunities for studies in user behavior modeling and multi-modal
learning. However, their decentralized architecture presents two key challenges: 1)
Distributed data and inconsistent access strategies across several individual instances
make a unified collection difficult; 2) user-generated content (UGC) contains multiple
modalities while lacking standard organization and high-quality annotation. To address
these issues, we constructed FediData, a comprehensive multi-modal dataset from Mastodon.
Our dataset integrates user profiles, text, images, and social interactions. To validate
FediData's usefulness, we designed and analyzed several tasks and systematically evaluated
the performance of existing state-of-the-art methods. Our analysis reveals the unique
challenges of DOSNs and highlights the value of FediData in DOSN-related studies.
We believe FediData could serve as a foundational dataset for advancing user behavior
analytics, multi-modal learning, and future decentralized web research. All data and
documentation are available in a Zenodo repository at https://zenodo.org/records/15621243
(DOI: 10.5281/zenodo.15621243).
STM-Graph: A Python Framework for Spatio-Temporal Mapping and Graph Neural Network
Predictions
- Amirhossein Ghaffari
- Huong Nguyen
- Lauri Lovén
- Ekaterina Gilman
Urban spatio-temporal data present unique challenges for predictive analytics due
to their dynamic and complex nature. We introduce STM-Graph, an open-source Python
framework that transforms raw spatio-temporal urban event data into graph representations
suitable for Graph Neural Network (GNN) training and prediction. STM-Graph integrates
diverse spatial mapping methods, urban features from OpenStreetMap, multiple GNN models,
comprehensive visualization tools, and a graphical user interface (GUI) suitable for
professional and non-professional users. This modular and extensible framework facilitates
rapid experimentation and benchmarking. It allows integration of new mapping methods
and custom models, making it a valuable resource for researchers and practitioners
in urban computing. The source code of the framework and GUI are available at: https://github.com/Ahghaffari/stm_graph
and https://github.com/tuminguyen/stm_graph_gui.
E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on
High-impact Legal Cases
- Candida M. Greco
- Lorenzo Zangari
- Davide Picca
- Andrea Tagarelli
The way the media report on legal cases can significantly shape public opinion, often
embedding subtle biases that influence societal views on justice, fairness, and morality.
Analyzing these narratives requires a holistic approach that captures their emotional
tone, moral framing, and the specific events they convey. In this work, we introduce
E2MoCase, a novel dataset that enables integrated analysis of emotions, morality,
and events within legal narratives and media coverage. We leverage NLP models to extract
events and predict morality and emotions, providing a multidimensional perspective
on how legal cases are portrayed in news articles. Our experimental evaluation showed
that E2MoCase is beneficial for addressing emotion- and morality-based tasks, which
is also confirmed by a human evaluation of the annotations.
A Large-Scale Web Search Dataset for Federated Online Learning to Rank
- Marcel Gregoriadis
- Jingwei Kang
- Johan Pouwelse
The centralized collection of search interaction logs for training ranking models
raises significant privacy concerns. Federated Online Learning to Rank (FOLTR) offers
a privacy-preserving alternative by enabling collaborative model training without
sharing raw user data. However, benchmarks in FOLTR are largely based on random partitioning
of classical learning-to-rank datasets, simulated user clicks, and the assumption
of synchronous client participation. This oversimplifies real-world dynamics and undermines
the realism of experimental results. We present AOL4FOLTR, a large-scale web search
dataset with ≈ 2.6 million queries from 10,000 users. Our dataset addresses key limitations
of existing benchmarks by including user identifiers, real click data, and query timestamps,
enabling realistic user partitioning, behavior modeling, and asynchronous federated
learning scenarios.
A Large-Scale Dataset of Interactions Between Weibo Users and Platform-Empowered LLM
Agent
- Shaokui Gu
- Yongjie Yin
- Qingyuan Gong
- Fenghua Tong
- Yipeng Zhou
- Qiang Duan
- Yang Chen
We release a large-scale dataset that captures interactions between human users and
CommentRobert, an LLM-based social media agent on Weibo. The dataset contains Weibo posts in which
users actively mention the LLM agent account @CommentRobert, indicating that the users
are interested in interacting with the platform-empowered LLM agent. The dataset contains
557,645 interactions from 304,400 unique users over 17 months. We detail our data
collection methodology, user attributes, and content characteristics, underscoring
the dataset's value in examining real-world human-LLM agent interactions. Our analysis
offers insights into the demographic and behavioral traits of users interested in
the selected LLM agent, interaction dynamics between humans and the agent, and linguistic
patterns in comments. These interactions provide a unique lens through which to explore
how humans perceive, trust, and communicate with LLMs. This dataset enables further
research into modeling human intent understanding, improving LLM agent design, and
studying the evolution of human-LLM agent relationships. Potential applications also
include long-term user engagement prediction and AI-generated comment detection on
social platforms. This constructed dataset is available at https://zenodo.org/records/16921462.
PersonaGen: A Persona-Driven Open-Ended Machine-Generated Text Dataset
- Carmelo Gugliotta
- Lucio La Cava
- Andrea Tagarelli
We present PersonaGen, a novel dataset for investigating persona-driven machine-generated
text (MGT) produced by Open Large Language Models (OLLMS). PersonaGen is specifically
designed to investigate how synthetic persona profiles affect, guide, or manifest
in MGT. We built PersonaGen by pairing curated persona-profiles (i.e., description
of characteristics, background, and goals) across eight thematic domains (e.g., Physics,
Education, Medicine) with prompts covering various narrative or opinion-style content
(e.g., stories, commonsense). Open-ended generations were produced by six representative
OLLMs, yielding a total of 1.44 million persona-driven generations. PersonaGen supports
multiple research tasks, such as machine-generated text attribution, persona category
detection, and persona profile identification, thus providing a valuable resource
for studying LLM controllability and role-playing behavior, as well as the impact
of persona profile conditioning in downstream tasks. We have released PersonaGen on the Hugging Face platform at https://doi.org/10.57967/hf/5805.
Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social
Network Services
- Hongcheng Guo
- Zheyong Xie
- Shaosheng Cao
- Boyang Wang
- Weiting Liu
- Zheyu Ye
- Zhoujun Li
- Zuozhu Liu
- Wei Lu
As interest in using Large Language Models for interactive and emotionally rich experiences
grows, virtual pet companionship emerges as a novel yet underexplored application.
Existing approaches focus on basic pet role-playing interactions without systematically
benchmarking LLMs for comprehensive companionship. In this paper, we introduce PET-BENCH,
a dedicated benchmark that evaluates LLMs across both self-interaction and human-interaction dimensions. Unlike prior work, PET-BENCH emphasizes self-evolution and developmental
behaviors alongside interactive engagement, offering a more realistic reflection of
pet companionship. It features diverse tasks such as intelligent scheduling, memory-based
dialogues, and psychological conversations, with over 7,500 interaction instances
designed to simulate pet behaviors. Evaluation of 28 LLMs reveals significant performance
variations linked to model size and inherent capabilities, underscoring the need for
specialized optimization in this domain. PET-BENCH serves as a foundational resource
for benchmarking pet-related LLM abilities and advancing emotionally immersive human-pet
interactions.
EFT-LR: Benchmarking Learning Rate Policies in Parameter-Efficient Large Language
Model Fine-tuning
- Md Tasnim Jawad
- Yanzhao Wu
Large Language Models (LLMs) have achieved extensive impacts across various real-world
data mining applications. Given the extremely high cost of training or fine-tuning
LLMs, parameter-efficient fine-tuning (e.g., LoRA) has emerged as a popular and practical
approach for adapting pre-trained general-purpose LLMs to specific downstream tasks.
Among the various hyperparameters involved in parameter-efficient fine-tuning of LLMs,
the learning rate (LR) plays a crucial role in determining the overall performance.
However, it lacks a systematic benchmark framework to explore and understand how different
LR policies influence the effectiveness of parameter-efficient LLM fine-tuning, which
makes it challenging to select an optimal LR policy. To address this critical research
gap, this paper introduces a systematic benchmark, EFT-LR, for assessing and selecting
LR policies for effective parameter-efficient fine-tuning of LLMs. We first present
a collection of seven popular LR policies spanning three major categories in the literature.
We then perform parameter-efficient fine-tuning of LLMs using these LR policies and
assess fine-tuned LLMs on eight downstream tasks. Our empirical analysis using EFT-LR
provides an in-depth investigation of the impacts of different LR policies on parameter-efficient
LLM fine-tuning, offering practical guidelines for practitioners. We provide the source
code at https://github.com/mlsysx/EFT-LR.
The Yelp Collaborative Knowledge Graph
- Theis Jendal
- Mads Corfixen
- Magnus Olesen
- Peter Dolog
- Katja Hose
- Daniele Dell'Aglio
- Matteo Lissandrini
Yelp Open Dataset (YOD) is a widely used dataset for Recommender Systems (RS). Multiple
Knowledge Graphs (KGs) have been built for YOD, but they have various issues: the
conversion processes usually do not follow state-of-the-art methodologies, fail to
properly link to other KGs, do not link to existing vocabularies, ignore important
data, and are generally of small size. Instead, we present the Yelp Collaborative
Knowledge Graph (YCKG), where we correctly integrating taxonomies, product categories,
business locations, and the Yelp social network, through common practices within the
semantic web community, overcoming all these issues. As a result, the YCKG includes
150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over
244 million triples, 144 distinct predicates, for about 72 million resources, with
an average in-degree and out-degree of 3.3 and 12.2, respectively. Further, we release
both the data and the code used to generate the KG for inspection and further extensions.
This dataset can be used to develop and test both recommendation and data-mining algorithms
able to exploit rich and semantically meaningful knowledge. We publicize the code
for the CKG construction on: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph.
Generative Recommendation with Semantic IDs: A Practitioner's Handbook
- Clark Mingxuan Ju
- Liam Collins
- Leonardo Neves
- Bhuvesh Kumar
- Louis Yufeng Wang
- Tong Zhao
- Neil Shah
Generative recommendation (GR) has gained increasing attention for its promising performance
compared to traditional models. A key factor contributing to the success of GR is
the semantic ID (SID), which converts continuous semantic representations (e.g., from
large language models) into discrete ID sequences. However, varied modeling techniques,
hyper-parameters, and experimental setups in existing literature make direct comparisons
between GR proposals challenging. Furthermore, the absence of an open-source, unified
framework hinders systematic benchmarking and extension, slowing model iteration.
To address this challenge, our work introduces and open-sources a framework for Generative
Recommendation with semantic ID, namely GRID, specifically designed for modularity
to facilitate easy component swapping and accelerate idea iteration. Using GRID, we
systematically experiment with and ablate different components of GR models with SIDs
on public benchmarks. Our comprehensive experiments with GRID reveal that many overlooked
architectural components in GR models with SIDs substantially impact performance.
This offers both novel insights and validates the utility of an open-source platform
for robust benchmarking and GR research advancement. GRID is open-sourced at https://github.com/snap-research/GRID.
CausalBench-ER: Causally-Informed Explanations and Recommendations for Reproducible
Benchmarking
- Ahmet Kapkiç
- Pratanu Mandal
- Abhinav Gorantla
- Shu Wan
- Ertuğrul Çoban
- Paras Sheth
- Huan Liu
- K. Selçuk Candan
Due to the critical role causality plays in decision-making, the state of-the-art
in machine learning for causality is rapidly evolving. With rapid development and
deployment of new models, datasets, and metrics, it is increasingly difficult for
researchers and practitioners to identify the most suitable approach for their problem.
Models exhibit different performances when they train on different data or even when
they are used under different hardware/software platforms, making it challenging for
users to select the appropriate setup pertinent to their problem. To address these
difficulties, we present a computing framework, CausalBench-ER that serves, not only
as a benchmarking platform for causal machine learning models, but also as a resource
that can explain benchmarking results across different metrics, software, and hardware
setups. Furthermore, CausalBench-ER recommends additional scenarios to consider to
help pave the way towards more robust benchmarking.
Datasets for Supervised Adversarial Attacks on Neural Rankers
- Amir Khosrojerdi
- Amin Bigdeli
- Radin Hamidi Rad
- Morteza Zihayat
- Charles L. A. Clarke
- Ebrahim Bagheri
We introduce a novel dataset for adversarial rank attacks against neural rankers,
enabling systematic research on robustness. Unlike prior unsupervised or surrogate-based
methods, our approach uses Retrieval-Augmented Generation (RAG) with a Large Language
Model (LLM) to create high-quality adversarial examples that subtly alter rankings
while maintaining coherence and relevance. Built via a self-refining LLM-Ranker feedback
loop, the dataset includes two tiers: Gold and Diamond, based on attack strength,
along with rich metadata, ranking labels, and quality metrics. Released with code
and prompts, it supports training, evaluation, and benchmarking of robust ranking
systems.
RuSemCor: A Word Sense Disambiguation corpus for Russian
- Alexander Kirillovich
- Ilia Karpov
- Natalia Loukachevitch
- Maksim Kulaev
- Dmitry Ilvovsky
We present RuSemCor, an open Word Sense Disambiguation (WSD) corpus for Russian. The
corpus was constructed by manually linking tokens from the OpenCorpora corpus to senses
in the Russian wordnet RuWordNet. It consists of 869 documents with 121,710 tokens
of which 51,588 are wordnet annotated. The resource is represented using the NIF,
OLiA, OntoLex, and Global WordNet ontologies and integrated into the Linguistic Linked
Open Data cloud. We used RuSemCor as a diagnostic benchmark to evaluate a range of
WSD methods. Our experiments yielded three main findings. 1)~Generative LLMs substantially
outperform traditional knowledge-based methods such as Personalized PageRank. 2)~Despite
their strengths, generative LLMs do not surpass encoder-based models specifically
trained for WSD. 3)~Incorporating lexical-semantic relations from RuWordNet produces
mixed results: it enhances the performance of encoder-based models and leading LLMs
like GPT-4, DeepSeek, and Mistral 24B, but tends to degrade accuracy for smaller generative
models such as GPT-3 and Mistral 7B. The resource is distributed under the CC BY-SA
open license and is available at: https://github.com/LLOD-Ru/rusemcor.
NLP-QA: A Large-scale Benchmark for Informative Question Answering over Natural Language
Processing Documents
- Avishek Lahiri
- Debarshi Kumar Sanyal
- Imon Mukherjee
The exponential growth of research literature across AI domains necessitates efficient
information extraction via Question Answering (QA). However, scholarly QA development
is hindered by the scarcity of large-scale, expertly-annotated datasets, that are
needed for modern deep learning models. To address this gap and advance scholarly
QA, we introduce NLP-QA, a new dataset of question-answer pairs derived from NLP research
documents. We overcome the challenge of costly expert annotation by proposing a novel,
automated construction method that leverages content from conference presentation
slides. We create two versions of our dataset - one by extracting QA pairs from individual
slides (NLP-QA-SS) and the other by extracting QA pairs from the collection of slides
for a paper as a whole (NLP-QA-MS). We benchmark several Large Language Models (LLMs)
on NLP-QA in zero-shot settings, with and without finetuning, to establish performance
baselines, demonstrating the challenging nature as well as the utility of the dataset.
We demonstrate the challenging nature of our dataset in zero-shot long-context reasoning
of LLMs without additional finetuning. We show that there is a significant jump in
the LLMs' performance after finetuning with NLP-QA. The dataset and code are publicly
available at https://github.com/AvishekLahiri/NLP-QA.git
From Rules to Flexibility: A Resource and Method for SEC Item Extraction in Post-2021
10-K Filings
- Xiao Li
- Changhong Jin
- Ruihai Dong
10-K filings represent a significant repository for financial text analysis, encompassing
both standardized quantitative indicators and rich unstructured text content. In recent
years, the efficacy of rule-based extraction methods has been progressively limited
due to changes in the 10-K filing format. In this study, we propose a novel layout-robust
segmentation approach that achieves identification of financial report by combining
fuzzy matching and structural heuristics. Our approach has been employed in recent
10-K filings (2021-2024), resulting in a standardized dataset with item-level segmentation.
Furthermore, an automated validation protocol was developed in order to assess coverage
and ranking consistency. Analysis of the protocol indicates that our approach achieves
an average extraction accuracy of 87.8%. Finally, a case study utilising Item 1A to
forecast short-term stock volatility provides a practical demonstration of the application
of the corpus. This case study not only serves to validate the corpus but also showcases
its compatibility with EDGAR-CORPUS. Code, benchmarks, segmented 10-K filings, and
case studies are publicly available on GitHub. Our GitHub repository: https://github.com/johnny-xiao-li/Flex_10K
A Comprehensive Toolkit for Generalized Robust Vision
- Zhao Li
- Yuefeng Chen
- Hui Xue
- Xiaofeng Mao
While deep neural networks (DNNs) excel in computer vision tasks, their real-world
deployment is hindered by robustness limitations compared to human perception. Adversarial
attacks and data distribution shifts remain critical vulnerabilities, degrading model
performance under practical conditions. To address these challenges and advance robustness
research, we introduce a comprehensive, user-friendly toolkit for training, evaluating,
and analyzing robust vision models. It targets two key dimensions of robustness: 1)
Adversarial robustness-defending against malicious worst-case perturbations (adversarial
examples); 2) Natural robustness-maintaining performance under real-world corruptions
and distribution shifts. Through extensive image classification benchmarks, our toolkit
enables precise model assessment. We envision this toolkit accelerating the development
of practically robust models and bridging the gap between machine and human vision
capabilities.
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph
- Langming Liu
- Haibin Chen
- Yuhao Wang
- Yujin Yuan
- Shilei Liu
- Wenbo Su
- Xiangyu Zhao
- Bo Zheng
Large language models (LLMs) have demonstrated their capabilities across various natural
language processing (NLP) tasks. Their potential in e-commerce is also substantial,
evidenced by existing implementations in scenarios such as platform search and recommender
systems. One obstinate concern associated with LLMs is the factuality issue (e.g.,
hallucination), which is urgent in e-commerce due to its significant impact on user
experience and revenue. While some methods aim to evaluate the factuality of LLMs,
issues such as lack of objectivity, high consumption, and lack of domain expertise
arise. To this end, leveraging a collected knowledge graph (KG) as a reliable source,
we propose ECKGBench, a question-answering dataset to assess LLMs' capacity in e-commerce. Specifically,
each question is automatically generated based on one KG triple through a standardized
pipeline, guaranteeing evaluation quality and reliability. We evaluate advanced LLMs
using ECKGBench and provide insights into experimental results. The dataset is available
online at~ https://github.com/OpenStellarTeam/ECKGBench.
GCondenser: Benchmarking Graph Condensation
- Yilun Liu
- Ruihong Qiu
- Zi Huang
Large-scale graphs are valuable for graph representation learning, but the vast volume
of data often hinders model building efficiency. Graph condensation (GC) addresses
this challenge by compressing a large graph into a significantly smaller one that
still supports effective model training. While recent studies have proposed various
techniques to enhance condensation effectiveness, comprehensive and practical evaluations
of these methods remain limited. In this paper, we introduce GCondenser, a large-scale
graph condensation toolkit designed to facilitate flexible development, holistic evaluation
and comparison of mainstream GC approaches. GCondenser provides a standardised GC
pipeline with condensation, validation, and evaluation stages, and offers straightforward
extensibility to accommodate new methods and datasets. Additionally, we conduct a
thorough empirical study of existing GC methods, offering insights into multiple facets
of condensation performance. The toolkit is available at https://github.com/superallen13/GCondenser.
CSMD: Curated Multimodal Dataset for Chinese Stock Analysis
- Yu Liu
- Zhuoying Li
- Ruifeng Yang
- Fengran Mo
- Cen Chen
The stock market is a complex and dynamic system, where it is non-trivial for researchers
and practitioners to uncover underlying patterns and forecast stock movements. The
existing studies for stock market analysis rely on leveraging various types of information
to extract useful factors, which are highly conditional on the quality of the data
used. However, the currently available resources are mainly based on the U.S. stock
market in English, which is inapplicable to adapt to other countries. To address these
issues, we propose CSMD, a multimodal dataset curated specifically for analyzing the Chinese stock market
with meticulous processing for validated quality. In addition, we develop a lightweight
and user-friendly framework LightQuant for researchers and practitioners with expertise in financial domains. Experimental
results on top of our datasets and framework with various backbone models demonstrate
their effectiveness compared with using existing datasets. The datasets and code are
publicly available at the link: https://github.com/ECNU-CILAB/LightQuant.
Multimodal Banking Dataset: Understanding Client Needs through Event Sequences
- Dzhambulat Mollaev
- Ivan Kireev
- Mikhail Orlov
- Alexander Kostin
- Ivan Karpukhin
- Maria Postnova
- Gleb Gusev
- Andrey Savchenko
Financial organizations collect a huge amount of temporal (sequential) data about
clients, which is typically collected from multiple sources (modalities). Despite
the urgent practical need, developing deep learning techniques suitable to handle
such data is limited by the absence of large open-source multi-source real-world datasets
of event sequences. To fill this gap, which is mainly caused by security reasons,
we present the first industrial-scale publicly available multimodal banking dataset,
MBD, that contains information on more than 2M corporate clients of a large bank.
Clients are represented by several data sources: 950M bank transactions, 1B geo position
events, 5M embeddings of dialogues with technical support, and monthly aggregated
purchases of four bank products. All entries are properly anonymized from real proprietary
bank data, and the experiments confirm that our anonymization still saves all significant
information for introduced downstream tasks. MBD enables supports campaigning task
(predict future customer purchases). We provide numerical results for the state-of-the-art
event sequence modeling techniques demonstrate the superiority of fusion baselines
over single-modal techniques for this task. HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD
Github Link: https://github.com/Dzhambo/MBD
SparseKmeans: Efficient K-means Clustering For Sparse Data
- Khoi Nguyen Pham Dang
- He Zhe Lin
- Chih Jen Lin
We introduce SparseKmeans, the first Python package for fast K-means clustering on
high-dimensional sparse data. Most existing K-means implementations, such as scikit-learn,
are only optimized for dense data and do not run efficiently on sparse inputs. In
this work, we thoroughly investigate how to accelerate widely used K-means algorithms
on sparse data via matrix operations. In particular, we propose a new design of Elkan's
method that aggregates distance computations and reduces fragmented memory access.
By analyzing the structure of key matrices and leveraging highly optimized sparse
matrix libraries, SparseKmeans achieves up to 9x speedup over scikit-learn. The package
is available at https://github.com/cjlin1/sparsekmeans.
A Content-Driven Micro-Video Recommendation Dataset at Scale
- Yongxin Ni
- Yu Cheng
- Xiangyan Liu
- Junchen Fu
- Youhua Li
- Xiangnan He
- Yongfeng Zhang
- Fajie Yuan
Micro-form videos have emerged as a popular form of content, leading to extensive
research in micro-video recommendation with significant implications for the entertainment,
advertising, and e-commerce industries. However, the lack of publicly available large-scale
micro-video datasets presents a challenge for developing effective recommender systems.
To address this challenge, we introduce a comprehensive and diverse micro-video recommendation
dataset, referred to as ''MicroLens.'' This dataset comprises nine million user-item
interaction behaviors, one million users, and 91 thousand full-length micro-videos.
It includes rich modality information such as titles, cover images, and audio associated
with the videos. MicroLens serves as a benchmark for the content-driven micro-video
recommendation, allowing researchers to leverage diverse video modality information,
particularly the raw video features, to enhance the effectiveness of recommender systems.
This goes beyond the traditional reliance on item IDs or off-the-shelf pre-extracted
video/visual features, providing new avenues for improving recommendation accuracy
and personalization. We have conducted extensive experiments on MicroLens, benchmarking
multiple recommender models and video encoders, which have provided valuable insights
into the performance of micro-video recommendation. We anticipate that this dataset
will not only benefit the recommender system community but also foster advancements
in the field of video understanding. Our datasets, code, and additional documents
are available at https://github.com/westlake-repl/MicroLens.
S2Cap: A Benchmark and a Baseline for Singing Style Captioning
Singing voices contain much richer information than common voices, including varied
vocal and acoustic properties. However, current open-source audio-text datasets for
singing voices capture only a narrow range of attributes and lack acoustic features,
leading to limited utility towards downstream tasks, such as style captioning. To
fill this gap, we formally define the singing style captioning task and present S2Cap,
a dataset of singing voices with detailed descriptions covering diverse vocal, acoustic,
and demographic characteristics. Using this dataset, we develop an efficient and straightforward
baseline algorithm for singing style captioning. The dataset is available at https://zenodo.org/records/15673764.
Revisiting Pre-processing Group Fairness: A Modular Benchmarking Framework
- Brodie Oldfield
- Ziqi Xu
- Sevvandi Kandanaarachchi
As machine learning systems become increasingly integrated into high-stakes decision-making
processes, ensuring fairness in algorithmic outcomes has become a critical concern.
Methods to mitigate bias typically fall into three categories: pre-processing, in-processing,
and post-processing. While significant attention has been devoted to the latter two,
pre-processing methods, which operate at the data level and offer advantages such
as model-agnosticism and improved privacy compliance, have received comparatively
less focus and lack standardised evaluation tools. In this work, we introduce FairPrep,
an extensible and modular benchmarking framework designed to evaluate fairness-aware
pre-processing techniques on tabular datasets. Built on the AIF360 platform, FairPrep
allows seamless integration of datasets, fairness interventions, and predictive models.
It features a batch-processing interface that enables efficient experimentation and
automatic reporting of fairness and utility metrics. By offering standardised pipelines
and supporting reproducible evaluations, FairPrep fills a critical gap in the fairness
benchmarking landscape and provides a practical foundation for advancing data-level
fairness research.
QueryBridge: One Million Annotated Questions with SPARQL Queries - Dataset for Question
Answering over Knowledge Graphs
- Abdelghny Orogat
- Ahmed El-Roby
Question answering over knowledge graphs (QAKG) involves interpreting natural language
questions and linking them to structured knowledge graphs. Existing benchmark datasets
(e.g., QALD, LC-QuAD) are limited in size and annotation, hindering QAKG model generalization.
To address this, we present QueryBridge, a dataset with over one million annotated
questions paired with SPARQL queries. Each question is tagged with essential elements
(e.g., entities, relationships) and annotated by query shape (e.g., chain, star) to
support complex reasoning.
Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Benchmark
- Zhenhui Ou
- Dawei Li
- Zhen Tan
- Wenlin Li
- Huan Liu
- Siyuan Song
Construction safety research is a critical field in civil engineering, aiming to mitigate
risks and prevent injuries through the analysis of site conditions and human factors.
However, the limited volume and lack of diversity in existing construction safety
datasets pose significant challenges to conducting in-depth analyses. To address this
research gap, this paper introduces the Construction Safety Dataset (CSDataset), a
well-organized comprehensive multi-level dataset that encompasses incidents, inspections,
and violations recorded sourced from the Occupational Safety and Health Administration
(OSHA). This dataset uniquely integrates structured attributes with unstructured narratives,
facilitating a wide range of approaches driven by machine learning and large language
models. We also conduct a preliminary approach benchmarking and various cross-level
analyses using our dataset, offering insights to inform and enhance future efforts
in construction safety. For example, we found that complaint-driven inspections were
associated with a 17.3% reduction in the likelihood of subsequent incidents. Our dataset
and code are released at https://github.com/zhenhuiou/Construction-Safety-Dataset-CSDataset.
PEQQS: a Dataset for Probing Extractive Quantity-focused Question Answering from Scientific
Literature
- Maciej Rybinski
- Necva Bölücü
- Huichen Yang
- Stephen Wan
Question Answering (QA) and Information Retrieval (IR) play a crucial role in information-seeking
pipelines implemented in many emerging AI research assistant applications. Large Language Models (LLMs) have demonstrated exceptional effectiveness
on QA tasks, with Retrieval Augmented Generation (RAG) techniques often boosting the
results. However, in many of those emerging applications, the onus of conducting the
actual literature search falls on the user, i.e. the user searches for the relevant
literature and the LLM-based assistant extracts the solicited answers from each of
the user-supplied documents. The interplay between the quality of the user-conducted
search and the quality of the final results remains understudied.
In this work, we focus on a specific version of such pipeline, where users aim to
obtain a specific quantity as an extractive answer (e.g., a value of a particular
measurable parameter). To this end, we provide a dataset of 1031 agricultural sciences
abstracts annotated with correct extractive answers. Additionally, this dataset builds
on our previous work, which focused on quantity-centric search from a corpus of over
3.3M documents, which means the dataset also consists of 1104 query-document relevance
judgments for 39 queries. The availability of both document-level annotations and
corpus-level relevance judgments means that our dataset allows for an end-to-end evaluation
of an information-seeking pipeline consisting of both literature search and the QA
module. We present how our dataset can be used both for the evaluation of extractive
quantity-focused QA from science literature and for exploring the impact of search
on the downstream results, specifically focusing on hallucinations resulting from
processing non-relevant documents with LLMs.
A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in
LLM-generated Text
- Alicia Sagae
- Chia-Jung Lee
- Sandeep Avula
- Brandon Dang
- Vanessa Murdock
Current methods for evaluating large language models (LLMs) typically focus on high-level
tasks such as text generation, without targeting a particular AI application. This
approach is not sufficient for evaluating LLMs for Responsible AI dimensions like
fairness, since protected attributes that are highly relevant in one application may
be less relevant in another. In this work, we construct a dataset that is driven by
a real-world application (generate a plain-text product description, given a list
of product features), parameterized by fairness attributes intersected with gendered
adjectives and product categories, yielding a rich set of labeled prompts. We show
how to use the data to identify quality, veracity, safety, and fairness gaps in LLMs,
contributing a proposal for LLM evaluation paired with a concrete resource for the
research community.
Real-E: A Foundation Benchmark for Advancing Robust and Generalizable Electricity
Forecasting
- Chen Shao
- Michael Färber
- Sebastian Pütz
- Benjamin Schäfer
- Yue Wang
- Tobias Käfer
- Zhanbo Huang
- Zhenyi Zhu
Energy forecasting is vital for grid reliability and operational efficiency. Although
recent advances in time series forecasting have led to progress, existing benchmarks
remain limited in spatial and temporal scope and lack multi-energy features. This
raises concerns about their reliability and applicability in real-world deployment.
To address this, we present the Real-E dataset, covering over 74 power stations across 30+ European countries over a 10-year
span with rich metadata. Using Real-E, we conduct an extensive data analysis and benchmark over 20 baselines across various
model types. We introduce a new metric to quantify shifts in correlation structures
and show that existing methods struggle on our dataset, which exhibits more complex
and non-stationary correlation dynamics. Our findings highlight key limitations of
current methods and offer a strong empirical basis for building more robust forecasting
models.
SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset
- Răzvan-Alexandru Smădu
- Andreea Iuga
- Dumitru-Clementin Cercel
- Florin Pop
Satire, irony, and sarcasm are techniques that are typically used humorously or critically,
rather than deceptively; they can occasionally be mistaken for factual reporting,
akin to fake news. These techniques can be applied at a more granular level, allowing
satirical information to be incorporated into news articles. In this paper, we introduce
the first sentence-level dataset for Romanian satire detection for news articles,
called SeLeRoSa. The dataset comprises 13,873 manually annotated sentences spanning
various domains, including social issues, IT, science, and movies. With the rise and
recent progress of large language models (LLMs) in the natural language processing
literature, LLMs have demonstrated enhanced capabilities to tackle various tasks in
zero-shot settings. We evaluate multiple baseline models based on LLMs in both zero-shot
and fine-tuning settings, as well as transformer-based models. Our findings reveal
the current limitations of these models in the sentence-level satire detection task,
paving the way for new research directions.
When Facts Expire: Benchmarking Temporal Validity in Knowledge Graphs
- Thibaut Soulard
- Fatiha Saïs
- Joe Raad
Knowledge Graphs (KGs) are essential in applications like semantic search, question
answering, and decision support. They structure knowledge, validate facts, enable
inference, and increasingly enhance Large Language Models (LLMs) by grounding outputs
in structured, factual data. However, KGs have treated facts as static and timeless,
ignoring the temporal nature of many truths. This leads to outdated or incorrect inferences.
Temporal Knowledge Graphs (TKGs), like Wikidata and YAGO, address this by modeling
the time-bound validity of facts. Multiple recent work has focused on predicting missing
temporal facts, yet validating existing temporal information, to ensure the reliability
and accuracy of TKGs, remains underexplored. In order to advance this area of research,
we introduce the first benchmark designed to evaluate temporal fact validation methods.
Derived from Wikidata, this benchmark supports systematic, quantitative, and qualitative
comparisons, incorporating diverse assumptions about temporal data (e.g., timestamps,
intervals) and KG structures (e.g., density, depth).
MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt
- Benjamin Vendeville
- Liana Ermakova
- Pierre De Loor
- Jaap Kamps
Errors in natural language generation, so-called hallucinations, remain a critical
challenge, particularly in high-stakes domains such as healthcare or science communication.
While several automatic metrics have been proposed to detect and quantify hallucinations,
such as FactCC, QAGS, FEQA, and FactAcc, these metrics are often unavailable, difficult
to reproduce, or incompatible with modern development workflows. We introduce MIRAGE, an open-source Python library designed to address these limitations. MIRAGE re-implements
key hallucination evaluation metrics in a unified library built on the Hugging Face
framework, offering modularity, reproducibility, and standardized inputs and outputs.
By adhering to FAIR principles, MIRAGE promotes reproducibility, accelerates experimentation,
and supports the development of future hallucination metrics. We validate MIRAGE by
re-evaluating existing metrics on benchmark datasets, demonstrating comparable performance
while significantly improving usability and transparency.
FinS-Pilot: A Benchmark for Online Financial RAG System
- Feng Wang
- Yiding Sun
- Jiaxin Mao
- Xue Wei
- Danqing Xu
Large language models (LLMs) have demonstrated remarkable capabilities across various
professional domains, with their performance typically evaluated through standardized
benchmarks. In the financial field, the stringent demands for professional accuracy
and real-time data processing often necessitate the use of retrieval-augmented generation
(RAG) techniques. However, the development of financial RAG benchmarks has been constrained
by data confidentiality issues and the lack of dynamic data integration. To address
this issue, we introduce FinS-Pilot, a novel benchmark for evaluating RAG systems
in online financial applications. Constructed from real-world financial assistant
interactions, our benchmark incorporates both real-time API data and text data, organized
through an intent classification framework covering critical financial domains. The
benchmark enables comprehensive evaluation of financial assistants' capabilities in
handling both static knowledge and time-sensitive market information.Through systematic
experiments with multiple Chinese leading LLMs, we demonstrate FinS-Pilot's effectiveness
in identifying models suitable for financial applications while addressing the current
gap in specialized evaluation tools for the financial domain. Our work contributes
both a practical evaluation framework and a curated dataset to advance research in
financial NLP systems. The code and dataset are accessible on GitHub.
HUSK: A Hierarchically Structured Urban Knowledge Graph Dataset for Multi-Level Spatial
Tasks
- Qiqi Wang
- Guanjin Wang
- Yihong Pan
- Zhipeng Lin
- Huijia Li
- Qian Liu
- Kaiqi Zhao
Urban spatial tasks span multiple levels, ranging from area-level analysis, crime
prediction, and taxi demand forecasting to POI-level tasks such as new store recommendation.
Urban knowledge graphs (UrbanKGs) can enhance these tasks by integrating structured
urban knowledge. However, existing studies face two main issues: most research uses
task-specific UrbanKGs for corresponding single-level predictions, and public UrbanKGs
contain only coarse-grained administrative areas, lacking the rich semantic and spatial
relationships required for multi-level tasks. We propose a Hierarchically Structured
UrbanKG Dataset (HUSK) with an intermediate functional zone layer that bridges and
enriches the understanding across multiple levels, and evaluate it on three area-level
and three POI-level tasks, showing accuracy improvements over single-view baselines.
TalkDep: Clinically Grounded LLM Personas for Conversation-Centric Depression Screening
- Xi Wang
- Anxo Perez
- Javier Parapar
- Fabio Crestani
The increasing demand for mental health services has outpaced the availability of
real training data to develop clinical professionals, leading to limited support for
the diagnosis of depression. This shortage has motivated the development of simulated
or virtual patients to assist in training and evaluation, but existing approaches
often fail to generate clinically valid, natural, and diverse symptom presentations.
In this work, we embrace the recent advanced language models as the backbone and propose
a novel clinician-in-the-loop patient simulation pipeline, TalkDep, with access to
diversified patient profiles to develop simulated patients. By conditioning the model
on psychiatric diagnostic criteria, symptom severity scales, and contextual factors,
our goal is to create authentic patient responses that can better support diagnostic
model training and evaluation. We verify the reliability of these simulated patients
with thorough assessments conducted by clinical professionals. The availability of
validated simulated patients offers a scalable and adaptable resource for improving
the robustness and generalisability of automatic depression diagnosis systems.
StoryWriter: A Multi-Agent Framework for Long Story Generation
- Haotian Xia
- Hao Peng
- Yunjia Qi
- Bin Xu
- Juanzi Li
- Hou Lei
- Xiaozhi Wang
Long story generation remains a challenge for existing large language models (LLMs),
primarily due to two main factors: (1) discourse coherence, which requires plot consistency,
logical coherence, and completeness in the long-form generation, and (2) narrative
complexity, which requires an interwoven and engaging narrative. In this paper, we
present StoryWriter, a modular and open-source multi-agent framework for controllable and scalable long
story generation. We conduct both human and automated evaluation, and StoryWriter significantly outperforms existing story generation baselines in both story quality
and length. Furthermore, we use StoryWriter to generate a dataset, which contains
about 6,000 high-quality long stories, with an average length of 8,000 words. We train
the model Llama3.1-8B and GLM4-9B using supervised fine-tuning on LongStory and develop
StoryWriterLLAMA and StoryWriterGLM, which demonstrates advanced performance in long story generation. All code, models,
and data are made publicly available to encourage further development.
Maneno Yetu: Dynamic Corpus Construction and Pretraining for Swahili NLP
- Chaddy Anthony Zawuya
- Alfred Malengo Kondoro
- Diana Rwegasira
- Juma H. Lungo
Swahili occupies a central place in African linguistic landscapes, yet it is significantly
under-resourced in NLP, reflecting a mismatch between speaker population and data
availability. We introduce Maneno Yetu, a dynamic and extensible corpus designed to
address this gap. It is continuously updated with diverse sources such as news articles,
blogs, literature, and educational content. This structure enables robust pretraining
and fine-tuning for Swahili NLP. The evolving nature of the corpus allows for longitudinal
linguistic analysis, providing a unique opportunity to track language change over
time. It also serves as a foundation for creating niche, task-specific datasets in
low-resource settings. Building on Maneno Yetu, we present the Swahili Language Foundational
Model (SLFM), a transformer-based model trained to support core NLP tasks including
tokenization, part-of-speech tagging, machine translation, and abusive language detection.
Both the corpus and model are released publicly to support reproducible research and
foster community-driven development in African language technologies.
UXSim: Towards a Hybrid User Search Simulation
- Saber Zerhoudi
- Michael Granitzer
Simulating nuanced user experiences within complex interactive search systems poses
distinct challenge for traditional methodologies, which often rely on static user
proxies or, more recently, on standalone large language model (LLM) agents that may
lack deep, verifiable grounding. The true dynamism and personalization inherent in
human-computer interaction demand a more integrated approach. This work introduces
UXSim. https://searchsim.org/uxsim, a novel framework that integrates both approaches.
It leverages grounded data from traditional simulators to inform and constrain the
reasoning of an adaptive LLM agent. This synthesis enables more accurate and dynamic
simulations of user behavior while also providing a pathway for the explainable validation
of the underlying cognitive processes.
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
- Xu Zhang
- Zhifei Liu
- Jiahao Wang
- Huixuan Zhang
- Fan Xu
- Junzhe Zhang
- Xiaojun Wan
Despite the rapid advancement of large language models, they remain highly susceptible
to generating hallucinations, which significantly hinders their widespread application.
Hallucination research requires dynamic and fine-grained evaluation. However, most
existing hallucination benchmarks (especially in Chinese language) rely on human annotations,
making automatical and cost-effective hallucination evaluation challenging. To address
this, we introduce HaluAgent, an agentic framework that automatically constructs fine-grained
question-answering (QA) dataset based on some knowledge documents. Our experiments
demonstrate that the manually designed rules and prompt optimization can improve the
quality of generated data. Using HaluAgent, we construct C-FAITH, a Chinese QA hallucination
benchmark created from 1,399 knowledge documents obtained from web scraping, totaling
60,702 entries. We comprehensively evaluate 16 mainstream LLMs with our proposed C-FAITH,
providing detailed experimental results and analysis.
PyG-SSL: A Graph Self-Supervised Learning Toolkit
- Lecheng Zheng
- Baoyu Jing
- Zihao Li
- Zhichen Zeng
- Tianxin Wei
- Mengting Ai
- Xinrui He
- Lihui Liu
- Dongqi Fu
- Jiaxuan You
- Hanghang Tong
- Jingrui He
Graph Self-Supervised Learning (SSL) has emerged as a pivotal area of research in
recent years. By engaging in pretext tasks to learn the intricate topological structures
and properties of graphs using unlabeled data, these graph SSL models achieve enhanced
performance, improved generalization, and heightened robustness. Despite the remarkable
achievements of these graph SSL methods, their current implementation poses significant
challenges for beginners and practitioners due to the complex nature of graph structures,
inconsistent evaluation metrics, and concerns regarding reproducibility hinder further
progress in this field. Recognizing the growing interest within the research community,
there is an urgent need for a comprehensive, beginner-friendly, and accessible toolkit
consisting of the most representative graph SSL algorithms. To address these challenges,
we present a Graph SSL toolkit named PyG-SSL, which is built upon PyTorch and is compatible
with various deep learning and scientific computing backends. Within the toolkit,
we offer a unified framework encompassing dataset loading, hyper-parameter configuration,
model training, and comprehensive performance evaluation for diverse downstream tasks.
Moreover, we provide beginner-friendly tutorials and the best hyper-parameters of
each graph SSL algorithm on different graph datasets, facilitating the reproduction
of results. The GitHub repository of the library is https://github.com/iDEA-iSAIL-Lab-UIUC/pyg-ssl.
TSD-CT: A Benchmark Dataset for Truthfulness Stance Detection
- Zhengyuan Zhu
- Haiqi Zhang
- Zeyu Zhang
- Chengkai Li
We present TSD-CT (Truthfulness Stance Detection-Claim and Tweet), a benchmark dataset
designed to advance research in truthfulness stance detection. While prior stance
detection datasets focus primarily on political figures, topics, or events, TSD-CT
targets truthfulness stance of social media posts toward factual claims. Truthfulness
stance reflects whether a post endorses a claim as true, rejects it as false, or expresses
no clear position. This focus is particularly valuable for tracking public reactions
to misinformation and for enabling applications that analyze belief dynamics in online
discourse. TSD-CT comprises 5,331 claim-tweet pairs, each annotated into one of five
classes: positive, negative, neutral/no stance, topically different, or problematic.
To ensure annotation quality, we introduce a strategy that uses gold-standard labels
to compute error scores, evaluate annotator performance, and filter out low-quality
contributions. The resulting dataset achieves strong inter-annotator agreement. An
error analysis further highlights frequent sources of confusion, particularly between
neutral/no stance and other classes. The dataset, along with the annotation interface
and codebase, is publicly released to facilitate further research.
SESSION: Demo Papers
RerankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human
and LLM Feedback
- Abdelrahman Abdallah
- Mahmoud Abdalla
- Bhawna Piryani
- Jamshid Mozafari
- Mohammed Ali
- Adam Jatowt
Evaluating the quality of retrieval-augmented generation (RAG) and document reranking
systems remains challenging due to the lack of scalable, user-centric, and multi-perspective
evaluation tools. We introduce RankArena, a unified platform for comparing and analysing
the performance of retrieval pipelines, rerankers, and RAG systems using structured
human and LLM-based feedback as well as for collecting such feedback. RankArena supports
multiple evaluation modes: direct reranking visualisation, blind pairwise comparisons
with human or LLM voting, supervised manual document annotation, and end-to-end RAG
answer quality assessment. It captures fine-grained relevance feedback through both
pairwise preferences and full-list annotations, along with auxiliary metadata such
as movement metrics, annotation time, and quality ratings. The platform also integrates
LLM-as-a-judge evaluation, enabling comparison between model-generated rankings and
human ground truth annotations. All interactions are stored as structured evaluation
datasets that can be used to train rerankers, reward models, judgment agents, or retrieval
strategy selectors. Our platform is publicly available at https://rankarena.ngrok.io/,
and the Demo video is provided. https://youtu.be/jIYAP4PaSSI.
White Rabbit: Demonstrating Online KG Pathfinding Using Embeddings
- Panagiotis Antivasis
- Giannis Vassiliou
- Georgios Tsamis
- Paraskevi Zacharia
- Eleftheria Barka
- Julian Gini
- Argyri Kyriakaki
- Giorgos Andreadakis
- Ioannis Christodoulakis
- Nikos Papadakis
- Haridimos Kondylakis
The paper introduces White Rabbit, a novel method for discovering high-quality, meaningful
paths between entities in online Knowledge Graphs (KGs). Traditional exploration methods,
such as SPARQL endpoints, struggle due to the large size and complexity of KGs. The
proposed approach addresses this by introducing the problem of context-aware path
finding, ensuring that retrieved paths are coherent and involve highly relevant entities.
White Rabbit uses embeddings to score entity neighbors, a queue-based prioritization
mechanism, and an iterative refinement process to improve efficiency and relevance.
The system is demonstrated live, allowing participants to test it and compare against
baseline methods (structural approaches, pretrained embeddings, and large language
models). Results show that White Rabbit enhances both the efficiency of exploration
and the quality of discovered paths.
SQuAI: Scientific Question-Answering with Multi-Agent Retrieval-Augmented Generation
- Ines Besrour
- Jingbo He
- Tobias Schreieder
- Michael Färber
We present SQuAI (https://squai.scads.ai/), a scalable and trustworthy multi-agent
retrieval-augmented generation (RAG) framework for scientific question answering (QA)
with large language models (LLMs). SQuAI addresses key limitations of existing RAG
systems in the scholarly domain, where complex, open-domain questions demand accurate
answers, explicit claims with citations, and retrieval across millions of scientific
documents. Built on over 2.3 million full-text papers from arXiv.org, SQuAI employs
four collaborative agents to decompose complex questions into sub-questions, retrieve
targeted evidence via hybrid sparse-dense retrieval, and adaptively filter documents
to improve contextual relevance. To ensure faithfulness and traceability, SQuAI integrates
in-line citations for each generated claim and provides supporting sentences from
the source documents. Our system improves faithfulness, answer relevance, and contextual
relevance by up to +0.088 (12%) over a strong RAG baseline. We further release a benchmark
of 1,000 scientific question-answer-evidence triplets to support reproducibility.
With transparent reasoning, verifiable citations, and domain-wide scalability, SQuAI
demonstrates how multi-agent RAG enables more trustworthy scientific QA with LLMs.
KnowFE : A Hybrid AI System for Explainable Feature Engineering using Knowledge-Guided
Reinforcement Learning
- Mohamed Bouadi
- Arta Alavi
- Salima Benbernou
- Mourad Ouziri
Feature engineering is a critical yet often manual step in building effective machine
learning models. While automated machine learning (AutoML) has streamlined many aspects
of model development, the generation of high-quality, interpretable features remains
a key bottleneck, requiring case-by-case domain knowledge and significant effort.
This challenge highlights the importance of automated feature engineering (AutoFE)
as a critical component within AutoML pipeline. To address this, we recently proposed
SMART, a novel AutoFE approach that combines knowledge graph reasoning and deep reinforcement
learning to guide the generation of interpretable features. In this demonstration,
we introduce KnowFE, a web-based AutoFE platform powered by SMART. KnowFE enables
users to generate high-quality, human-understandable features without writing any
code, striking a balance between explainability and predictive performance. With a
user-friendly interface, it empowers data practitioners to efficiently enhance machine
learning workflows across diverse domains. A video demonstration is available at https://www.KnowFE.com
.
SDD: Shape-aware Data-driven Attention Mechanism for Time Series Analysis
- Yanyun Cao
- Rundong Zuo
- Rui Cao
- Byron Choi
- Jianliang Xu
- Sourav S Bhowmick
Multivariate time series (mts ) analysis have extensive applications in various areas such as human activity recognition,
healthcare, and economics, among others. Recently, Transformer approaches have been
specifically designed for MTS and have consistently reported superior performance.
In this paper, we demonstrate a software system for a recent efficient shape-aware
Transformer (SDD ), where time-series subsequences (a.k.a shapes) are made available to users for
investigation. First, a time-series Transformer, called SVP-T, takes shapes, together
with their variable position information (VP information) as input to the training
of a Transformer model. These shapes are computed from different variables and time
intervals, enabling the Transformer model to learn dependencies simultaneously across
both time and variables. Second, a data-driven kernel-based attention mechanism, called
DARKER, reduces the time complexity of training Transformer models from O(N2) to O(N), where N is the number of inputs. As a result, the training process by using DARKER offers
about 3x-4x speedup over vanilla Transformers'. In this demo, we present the first
system (SDD ) that integrates SVP-T and DARKER. In particular, SDD visualizes the SVP-T's attention matrix and allows users to explore key shapes that
have high attention weights. Furthermore, users can use SDD to decide the shape input to train a new model, to further balance between efficiency
and accuracy.
The ReQAP System for Question Answering over Personal Information
- Philipp Christmann
- Gerhard Weikum
Personal information is abundant on users' devices, from structured data in calendar,
shopping records or fitness tools, to unstructured contents in mail and social media
posts. This works presents the ReQAP system that supports users with answers for complex
questions that involve filters, joins and aggregation over heterogeneous sources.
The unique trait of ReQAP is that it recursively decomposes questions and incrementally
builds an operator tree for execution. Both the question interpretation and the individual
operators make smart use of light-weight language models, with judicious fine-tuning.
The demo showcases the rich functionality for advanced user questions, and also offers
detailed tracking of how the answers are computed by the operators in the execution
tree. Being able to trace answers back to the underlying sources is vital for human
comprehensibility and user trust in the system.
Explain and Monitor Deep Learning Models for Computer Vision using Obz AI
- Neo Christopher Chung
- Jakub Binda
Deep learning has transformed computer vision (CV), achieving outstanding performance
in classification, segmentation, and related tasks. Such AI-based CV systems are becoming
prevalent, with applications spanning from medical imaging to surveillance. State
of the art models such as convolutional neural networks (CNNs) and vision transformers
(ViTs) are often regarded as ''black boxes,'' offering limited transparency into their
decision-making processes. Despite a recent advancement in explainable AI (XAI), explainability
remains underutilized in practical CV deployments. A primary obstacle is the absence
of integrated software solutions that connect XAI techniques with robust knowledge
management and monitoring frameworks.
To close this gap, we have developed Obz AI, a comprehensive software ecosystem designed to facilitate state-of-the-art explainability
and observability for vision AI systems. Obz AI provides a seamless integration pipeline,
from a Python client library to a full-stack analytics dashboard. With Obz AI, a machine
learning engineer can easily incorporate advanced XAI methodologies, extract and analyze
features for outlier detection, and continuously monitor AI models in real time. By
making the decision-making mechanisms of deep models interpretable, Obz AI promotes
observability and responsible deployment of computer vision systems.
ClimBurst: A Dynamic Visualization Tool to Display Climatological Anomalies over Time
and Space
- Guillaume Coulaud
- Benoit Lange
- Dennis Shasha
- Audrey Brouillet
- Reza Akbarinia
- Florent Masseglia
Detecting abnormal climate events across temporal and spatial scales is crucial to
the understanding of local and regional climate trends. This demonstration introduces
ClimBurst, a dynamic tool to detect climate bursts, which are unusually high or low
values of one or more climate variables over some time interval. ClimBurst detects
bursts without prior assumptions about their temporal duration. The demonstration
will allow users to interact directly with our system to see both a summary showing
the presence/absence of bursts over a user-specified year and spatial range. The demonstration
will also allow users to perform time-travel queries to see how bursts propagate over
space and time.
CALLM: A Framework for Systematic Contrastive Analysis of Large Language Models
- Reinhard Fritsch
- Adam Jatowt
This study addresses the challenges of analyzing discrepancies between different large
language models (LLMs). To facilitate the automatic exploration of these differences,
we propose a novel system called CALLM(Contrastive Analyzer of LLMs) that systematically
compares the outputs of two LLM versions based on user-defined queries. The system
first generates a hierarchical topic structure rooted in a user-specified query, allowing for an organized comparison of topical categories. Subsequently, it evaluates the text generated by both LLMs to identify differences in knowledge and
information presentation. This fully automated approach not only streamlines the identification of differences
in knowledge stored by LLMs, model-specific characteristics and performance variations
but can also enhance our understanding of architectural and training differences between
compared LLMs. Our work contributes to the development of more transparent machine
learning models and is meant to foster research in model evaluation and comparative
analysis.
HealthGenie: A Knowledge-Driven LLM Framework for Tailored Dietary Guidance
- Fan Gao
- Xinjie Zhao
- Ding Xia
- Zhongyi Zhou
- Rui Yang
- Jinghui Lu
- Hang Jiang
- Chanjun Park
- Irene Li
Seeking dietary guidance often requires navigating complex nutritional knowledge while
considering individual health needs. To address this, we present HealthGenie, an interactive platform that leverages the interpretability of knowledge graphs (KGs)
and the conversational power of large language models (LLMs) to deliver tailored dietary
recommendations alongside integrated nutritional visualizations for fast, intuitive
insights. Upon receiving a user query, HealthGenie performs intent refinement and maps user's needs to a curated nutritional knowledge
graph. The system then retrieves and visualizes relevant subgraphs, while offering
detailed, explainable recommendations. Users can interactively adjust preferences
to further tailor results. A within-subject study and quantitative analysis show that
HealthGenie reduces cognitive load and interaction effort while supporting personalized, health-aware
decision-making.
STORM: Spatio-Temporal Similar Trajectory Retrieval on Non-Uniform Maritime Data
- Xiaolin Han
- Yonghao Zhou
- Chenhao Ma
- Fang Li
- Xuequn Shang
Similar trajectory retrieval is crucial for maritime trajectory data analysis. However,
due to issues such as errors in maritime positioning devices and the accuracy limitations
of satellite positioning systems at sea, maritime trajectory data often exhibit characteristics
of non-uniform sampling. Existing algorithms struggle to effectively model the irregularity
of non-uniformly sampled maritime trajectories, leading to reduced performance in
similar trajectory retrieval. In this demonstration, we present STORM, a system designed
to effectively retrieve the top-k similar trajectories, which supports both user-specified
and automated query settings. STORM utilizes a learnable Fourier-based encoding method
to efficiently extract spatiotemporal features from non-uniform trajectories, significantly
enhancing the model's performance in similar trajectory retrieval. Our demonstration
shows that, compared to state-of-the-art (SOTA) methods, STORM achieves a 41.9% improvement
in performance for similar trajectory retrieval on non-uniform maritime data. Our
demonstration video is available at https://github.com/itszzzyyy/STORM.
Quantum Deepflow: A Quantum-Integrated Forecasting Platform for Strategic Decisions
in Raw Material Procurement
- Charmgil Hong
- Doohee Chung
- Jongyeong Kim
- Heewon Jung
We present Quantum Deepflow, a forecasting and decision support platform that integrates
classical and quantum sequence modeling to address volatility and data irregularity
in raw material procurement. The system combines an LSTM autoencoder with a Quantum
Long Short-Term Memory (QLSTM) model, which enables robust and accurate forecasts
from noisy time-series inputs. Users can interact with the platform through a visual
interface that links forecast outputs to strategic key performance indicators such
as purchase timing, cost estimates, and inventory risk. In a real-world deployment
at a Korean steel manufacturer, the system achieved a 32.5% reduction in overstocking
and saved $1.8 million in inventory costs. This work demonstrates a practical approach
to exposing quantum-enhanced forecasting capabilities through an automated, cloud-based
interface that bridges the gap between emerging quantum technology and enterprise-scale
decision-making.
How to Make Museums More Interactive? Case Study of Artistic Chatbot
- Filip J. Kucia
- Anna Wróblewska
- Bartosz Grabek
- Szymon D. Trochimiak
Conversational agents powered by Large Language Models (LLMs) are increasingly utilized
in educational settings, in particular in individual closed digital environments,
yet their potential adoption in the physical learning environments like cultural heritage
sites, museums, and art galleries remains relatively unexplored. In this study, we
present Artistic Chatbot, a voice-to-voice RAG-powered chat system to support informal
learning and enhance visitor engagement during a live art exhibition celebrating the
15th anniversary of the Faculty of Media Art at the Warsaw Academy of Fine Arts, Poland.
The question answering (QA) chatbot responded to free-form spoken questions in Polish
using the context retrieved from a curated, domain-specific knowledge base consisting
of 226 documents provided by the organizers, including faculty information, art magazines,
books, and journals. We describe the key aspects of the system architecture and user
interaction design, as well as discuss the practical challenges associated with deploying
chatbots at public cultural sites. Our findings, based on interaction analysis, demonstrate
that chatbots such as Artistic Chatbot effectively maintain responses grounded in
exhibition content (60% of responses directly relevant), even when faced with unpredictable
queries outside the target domain, showing their potential for increasing interactivity
in public cultural sites.
During the demo presentation, the audience will be invited to query our Artistic Chatbot,
which adopts the persona of an artificial art curator, a role that involves responding
to questions while simultaneously assessing their relevance to the exhibition. The
link for the demo video is available here https://github.com/cinekucia/artistic-chatbot-cikm2025.
EdgeSLU: 1.58-bit Voice Control Framework
- Seungeon Lee
- Junuk Jung
- Sanghyun Jung
- Changbeom Kang
- Jaeyoon Yoo
Natural-language voice interfaces promise ubiquitous smart environment control, yet
cloud dependence incurs latency, connectivity, and privacy costs that are intolerable
in safety-critical or bandwidth-limited settings. We introduce EdgeSLU, an entirely on-device speech pipeline that marries an extremely low 1.58-bit mixed-precision quantization scheme with an auto-tuned SIMD centric kernel, allowing
deployment on commodity edge hardware with only tens of MB of memory. On a Raspberry
Pi 5 (Arm Cortex A76), the Speech-To-Text (STT) engine achieves 6.37% WER in 35 MB
RAM and 2.1s latency, while the quantized Natural Language Understanding (NLU) model delivers
93.33% intent accuracy in 0.48s and 23 MB RAM, yielding end-to-end interaction time of 2.6s. A lightweight paraphrase-based data
generator bootstraps rich training sets from few examples, eliminating prohibitive
annotation overhead. Demonstrated through offline control of Philips Hue lamps, EdgeSLU
shows that aggressive mixed-precision quantization plus hardware-aware inference enables
practical, privacy preserving voice control on off-the-shelf edge devices.
PlaceSim: An LLM-based Interactive Platform for Human Behavior Simulation in Physical
Facilities
- Suhyeon Lee
- Youngjun Yu
- Donghyuk Shin
- Rita Singh
Physical facility design faces a fundamental cold-start problem: predicting human
behavior in non-existent spaces. Traditional surveys and observational studies create
gaps between stated preferences and actual usage, while existing simulation tools
require significant technical expertise, limiting accessibility. We introduce PlaceSim,
a web-based platform leveraging Large Language Models (LLMs) to simulate realistic
human behavior in facilities through a zero-code interface. PlaceSim employs a Persona-Environment-Scenario
(P.E.S.) framework that structures LLM reasoning through context-aware AI personas
with transparent decision-making processes. The platform provides interactive facility
design, AI-driven persona generation, live simulation with reasoning visualization,
and what-if analysis for scenario comparison. Evaluated on 18 months of real-world
apartment facility data (789,238 usage records from 8,435 residents), our zero-shot
approach achieves Jensen-Shannon Divergence scores as low as 0.006, outperforming
both supervised learning methods and existing LLM-based tools like SocioVerse without
requiring training data. PlaceSim establishes new benchmarks for spatial behavior
prediction while providing immediate, actionable insights for architects, urban planners,
and facility managers through systematic simulation. The platform is available at
https://simulation-viewer.vercel.app/
Achoio: A Skill-Aware Evaluation Management System for Text-To-Speech Research
- Haowei Lou
- Hye-Young Paik
- Basem Suleiman
- Wen Hu
- Lina Yao
Human subjective evaluation plays a crucial role in evaluating speech-related generative
tasks such as text-to-speech (TTS) generation. However, current practices are often
constrained by limited scalability, fragmented workflows, and inconsistent rating
reliability. Researchers frequently rely on manual methods or general-purpose crowdsourcing
systems, where recruiting appropriately skilled listeners is challenging, and result
analysis is labor-intensive. In this work, we introduce Achoio, a dedicated end-to-end
online system designed to streamline and scale human evaluation for the TTS research
community. Achoio allows researchers to create and manage evaluation projects, upload
synthesized speech samples, and automatically match them with qualified listeners
based on linguistic proficiency and domain knowledge. The system provides built-in
tools for project status tracking, result aggregation and visualization. In this demonstration,
we will walk through the core features of Achoio, including intuitive project setup,
skill-based listener matching algorithm, and automated analytics. By addressing the
limitations of existing workflows, Achoio offers a scalable, domain-aware, and analysis-ready
solution for conducting high-quality subjective TTS evaluations. Our system is live
and can be found at https://www.achoio.com. Demo is available on YouTube at https://youtu.be/Ugjj3_YooSM.
LLM4IA: Index Advising Via Large Language Models
- Xian Lyu
- Junbiao Zhang
- Yihang Zheng
- Guoliang Li
- Chen Lin
Recently, large-language models (LLMs) have demonstrated strong potential to solve
database problems. However, LLMs still face two challenges in solving the index selection
problem: (1) representing the workload in an LLM-friendly form and (2) finding the
optimal index set. To solve these challenges, we propose LLM4IA, an LLM-based index
selection method that can recommend indexes for any analytical workload directly on
any database instance. LLM4IA proposes a concise description of natural language by
extracting and sorting predicates and completely avoiding numerical input. LLM4IA
adopts an iterative index selection process by repeatedly improving previous index
candidates and summarizing effective candidates. Experiments on TPC-H and TPC-DS show
that LLM4IA surpasses the near-optimal index advisor Extend by 5%-10%. Our demonstration
highlights how LLM4IA recommends high-quality indexes for a new database instance
without expensive retraining or fine-tuning.
ESPRESSO: Privacy-Preserving Keyword Search on Decentralized Data with Differential
Visibility Constraints
- Mohamed Ragab
- Mohamed Bahrani
- Helen Oliver
- Thanassis Tiropanis
- Alexandra Poulovassilis
- Adriane Chapman
- George Roussos
We present ESPRESSO, a system designed for scalable and privacy-preserving keyword
search in decentralized data cooperatives. It addresses the challenges that differential
access control (allowing different data access rights for different search parties)
poses to de- centralized search by leveraging decentralized indexing, metadata-driven
source selection, and decentralized ranking techniques. The system ensures that search
parties can only access data within their data visibility scope, while maintaining
high retrieval efficiency and results quality. This demo will showcase its functionality
through interactive scenarios, including live querying, dynamic source selection,
and real-time visualization of search results. The audience will have hands-on interaction
with the system, exploring its application in real-world scenarios, such as in the
healthcare domain.
terazi: AI Fairness Tool for Doubly Imbalanced Data
- Asli Umay Ozturk
- Yigit Sever
- Ata Yalcin
- Viktoria Pauw
- Stephan Hachinger
- Ismail Hakki Toroslu
- Pinar Karagoz
The field of Artificial Intelligence (AI) fairness focuses on developing unbiased
approaches for machine learning problems, with many contributions and ready-to-use
tools. However, existing solutions fall short when both sensitive attributes and target
labels have imbalanced representations in a given dataset. Our proposed algorithm
and its tool, terazi, aim to propose a fair AI solution for this doubly imbalanced
case. The proposed solution is based on finding the optimal distribution within the
imbalanced data to balance fairness and classification performance, and the tool facilitates
using this solution. In this demonstration, we showcase the capabilities of our algorithm,
and the easy-to-use GUI of our web application for data scientists, researchers, and
AI practitioners.
AnDri: A System for Anomaly and Drift co-Detection
- Jongjun Park
- Ziqi Guo
- Fei Chiang
The presence of concept drift poses challenges for anomaly detection in time series.
While anomalies are caused by undesirable changes in the data, differentiating abnormal
changes from varying normal behaviours is difficult due to differing frequencies of
occurrence, varying time intervals when normal patterns occur, and identifying similarity
thresholds to separate the boundary between normal vs. abnormal sequences. Differentiating
between concept drift and anomalies is critical for accurate analysis as studies have
shown that the compounding effects of error propagation in downstream data analysis
tasks lead to lower detection accuracy and increased overhead due to unnecessary model
updates. Unfortunately, existing work has largely explored anomaly detection and concept
drift detection in isolation. We develop AnDri, a system for Anomaly detection in
the presence of Drift, and enables users to interactively co-explore the interaction
of anomalies and drift. Our system demonstration provides two motivating scenarios
that extend existing anomaly detection baselines with partial labels towards improved
co-detection accuracy, and highlights the superiority of AnDri over these baselines.
L3X: Long Object List Extraction from Long Documents
- Sneha Singhania
- Simon Razniewski
- Gerhard Weikum
Information extraction with LLMs is typically geared toward extracting individual
subject-predicate-object (SPO) triples from short factual texts such as Wikipedia
or news articles. In contrast, the L3X methodology tackles the task of extracting
long lists from long texts: given a target subject S and predicate P, the goal is to extract the complete list
of all objects O for which SPO holds. This is especially challenging over long texts,
like entire books or large web crawls, where many objects are long-tail entities.
We demonstrate L3X, a web-based system designed for this previously unexplored task.
L3X comprises of recall-oriented candidate generation using LLMs in RAG mode, with
novel methods for ranking and batching passages, followed by precision-oriented scrutinization.
Our demo supports exploring multiple configurations, including LLM-only and RAG baselines,
showcasing use cases like fiction-character relations from book series (e.g., 50+
friends of Harry Potter) and business relations from web pages (e.g., CEOs of Toyota).
The Temporal Game: A New Perspective on Temporal Relation Extraction
- Hugo Sousa
- Ricardo Campos
- Alípio Jorge
In this paper we demo the Temporal Game, a novel approach to temporal relation extraction
that casts the task as an interactive game. Instead of directly annotating interval-level
relations, our approach decomposes them into point-wise comparisons between the start
and end points of temporal entities. At each step, players classify a single point
relation, and the system applies temporal closure to infer additional relations and
enforce consistency. This point-based strategy naturally supports both interval and
instant entities, enabling more fine-grained and flexible annotation than any previous
approach. The Temporal Game also lays the groundwork for training reinforcement learning
agents, by treating temporal annotation as a sequential decision-making task. To showcase
this potential, the demo presented in this paper includes a Game mode, in which users
annotate texts from the TempEval-3 dataset and receive feedback based on a scoring
system, and an Annotation mode, that allows custom documents to be annotated and resulting
timeline to be exported. Therefore, this demo serves both as a research tool and an
annotation interface. The demo is publicly available at https://temporal-game.inesctec.pt,
and the source code is open-sourced to foster further research and community-driven
development in temporal reasoning and annotation.
Compare: A Framework for Scientific Comparisons
- Moritz Staudinger
- Wojciech Kusa
- Matteo Cancellieri
- David Pride
- Petr Knoth
- Allan Hanbury
Navigating the vast and rapidly increasing sea of academic publications to identify
institutional synergies, benchmark research contributions and pinpoint key research
contributions has become an increasingly daunting task, especially with the current
exponential increase in new publications. Existing tools provide useful overviews
or single-document insights, but none supports structured, qualitative comparisons
across institutions or publications. To address this, we demonstrate Compare, a novel
framework that tackles this challenge by enabling sophisticated long-context comparisons
of scientific contributions. Compare empowers users to explore and analyze research
overlaps and differences at both the institutional and publication granularity, all
driven by user-defined questions and automatic retrieval over online resources. For
this we leverage on Retrieval-Augmented Generation over evolving data sources to foster
long context knowledge synthesis. Unlike traditional scientometric tools, Compare
goes beyond quantitative indicators by providing qualitative, citation-supported comparisons.
A Demonstration of PKGem: Secure Enrichment of Personal Knowledge Graphs
- Junzhou Su
- Sriram Nutulapati
- Chang Ge
We present PKGem, a system that provides an end-to-end secure solution to enrich personal
knowledge graphs in mobile environments. This task faces two core challenges: First,
the proprietary, user-centric, and locally stored nature of personal knowledge graphs
makes collaborative enrichment with socially connected peers a privacy concern. Moreover,
the mobile environment has strict constraints on resource and computation cost, requiring
lightweight and efficient design. PKGem addresses both challenges by leveraging cryptographic
techniques to enable secure data enrichment across personal knowledge graphs, while
remaining practical under mobile constraints. The system is implemented as an Android
application and supports a variety of real-world usage scenarios. The code of PKGem
is available at https://github.com/golden-eggs-lab/pkgem, with a demonstration video
link included in the repository.
MMM-fair: An Interactive Toolkit for Exploring and Operationalizing Multi-Fairness
Trade-offs
- Swati Swati
- Arjun Roy
- Emmanouil Panagiotou
- Eirini Ntoutsi
Fairness-aware classification requires balancing performance and fairness, often intensified
by intersectional biases. Conflicting fairness definitions further complicate the
task, making it difficult to identify universally fair solutions. Despite growing
regulatory and societal demands for equitable AI, popular toolkits offer limited support
for exploring multi-dimensional fairness and related trade-offs. To address this,
we present mmm-fair, an open-source toolkit leveraging boosting-based ensemble approaches
that dynamically optimizes model weights to jointly minimize classification errors
and diverse fairness violations, enabling flexible multi-objective optimization. The
system empowers users to deploy models that align with their context-specific needs
while reliably uncovering intersectional biases often missed by state-of-the-art methods.
In a nutshell, mmm-fair uniquely combines in-depth multi-attribute fairness, multi-objective
optimization, a no-code, chat-based interface, LLM-powered explanations, interactive
Pareto exploration for model selection, custom fairness constraint definition, and
deployment-ready models in a single open-source toolkit, a combination rarely found
in existing fairness tools. Demo walkthrough available at: https://youtu.be/_rcpjlXFqkw.
iMask: Towards a Smart Mask Network Prototype for Monitoring Respiratory Viruses
- Emma Tong
- Cynthia Smyser
- Chen Chen
Although the impact of COVID-19 pandemic has largely withered away over the years,
the enduring presence of face masks continues to linger within our society. Still
today, they're a familiarity, a crutch to fall back upon when sickness makes its rounds.
However, it is often the case that we wear masks when they are not necessary, and,
more concerningly, fail to wear them when they are truly necessary. Inefficient viral
tracking methods further exacerbate the issue as they often do not alert the public
and healthcare officials to oncoming or currently happening outbreaks fast enough.
To combat these issues, new smart masks have been developed recently, equipped with
biosensors to detect respiratory viruses in the air. However, these masks have their
own drawbacks, including limited detection accuracy, detection scope and beneficial
population. In response, this work presents the first of its kind prototype named
iMask to augment current smart masks. By connecting the smartphone of the mask wearer to
the Internet, iMask improves the detection accuracy. At its core, iMask employs multi-variate time series (1) imputation algorithms to alleviate the data
scarcity issue and (2) forecasting algorithms to predict future viral levels. Based
on shared, imputed and forecast viral data, iMask further leverages map services to create a viral concentration map that significantly
expands the beneficiaries.
ReportGRI: Automating GRI Alignment and Report Assessment
- Aida Usmanova
- Rana Abdullah
- Debayan Banerjee
- Markus Leippold
- Ricardo Usbeck
Organisations disclose their sustainability performance in corporate sustainability
reports (CSRs). CSRs vary widely in structure and depth depending on the reporting
framework. Such disparity, together with report complexity and volume, poses significant
challenges to transparency, comparability and standardisation. To address this problem,
we introduce ReportGRI, an automated system for Global Reporting Initiative (GRI)
indexing and qualitative assessment of CSRs. The interactive framework leverages information
retrieval techniques and zero-shot prompting to enable GRI disclosure-based report
indexing and report coverage assessment by visualising well-covered topics and reporting
gaps. The tool facilitates scalable and explainable benchmarking of Environmental,
Social and Governance (ESG) reporting quality, enhancing report interpretation, transparency,
and corporate accountability. The system is open-sourced on GitHub with an introduction
video
MedSEBA: Synthesizing Evidence-Based Answers Grounded in Evolving Medical Literature
- Juraj Vladika
- Florian Matthes
In the digital age, people often turn to the Internet in search of medical advice
and recommendations. With the increasing volume of online content, it has become difficult
to distinguish reliable sources from misleading information. Similarly, millions of
medical studies are published every year, making it challenging for researchers to
keep track of the latest scientific findings. These evolving studies can reach differing
conclusions, which is not reflected in traditional search tools. To address these
challenges, we introduce MedSEBA, an interactive AI-powered system for synthesizing
evidence-based answers to medical questions. It utilizes the power of Large Language
Models to generate coherent and expressive answers, but grounds them in trustworthy
medical studies dynamically retrieved from the research database PubMed. The answers
consist of key points and arguments, which can be traced back to respective studies.
Notably, the platform also provides an overview of the extent to which the most relevant
studies support or refute the given medical claim, and a visualization of how the
research consensus evolved through time. Our user study revealed that medical experts
and lay users find the system usable and helpful, and the provided answers trustworthy
and informative. This makes the system well-suited for both everyday health questions
and advanced research insights.
An Interventional Approach to Real-Time Disaster Assessment via Causal Attribution
- Saketh Vishnubhatla
- Alimohammad Beigi
- Rui Heng Foo
- Umang Goel
- Ujun Jeong
- Bohan Jiang
- Adrienne Raglin
- Huan Liu
Traditional disaster analysis and modelling tools for assessing the severity of a
disaster are predictive in nature. Based on the past observational data, these tools
prescribe how the current input state (e.g., environmental conditions, situation reports)
results in a severity assessment. However, these systems are not meant to be interventional
in the causal sense, where the user can modify the current input state to simulate
counterfactual ''what-if'' scenarios. In this work, we provide an alternative interventional
tool that complements traditional disaster modelling tools by leveraging real-time
data sources like satellite imagery, news, and social media. Our tool also helps understand
the causal attribution of different factors on the estimated severity, over any given
region of interest. In addition, we provide actionable recourses that would enable
easier mitigation planning. Our source code is publicly available.
JustEva: A Toolkit to Evaluate LLM Fairness in Legal Knowledge Inference
- Zongyue Xue
- Siyuan Zheng
- Shaochun Wang
- Yiran Hu
- Yuxin Yao
- Shengran Wang
- Haitao Li
- Qingyao Ai
- Yiqun Liu
- Yun Liu
- Weixing Shen
The integration of Large Language Models (LLMs) into legal practice raises pressing
concerns about judicial fairness, particularly due to the nature of their ''black-box''
processes. This study introduces JustEva, a comprehensive, open-source evaluation toolkit designed to measure LLM fairness
in legal tasks. JustEva features several advantages: (1) a structured label system
covering 65 extra-legal factors; (2) three core fairness metrics -- inconsistency, bias, and imbalanced inaccuracy; (3) robust statistical inference methods; and (4) informative visualizations. The
toolkit supports two types of experiments, enabling a complete evaluation workflow:
(1) generating structured outputs from LLMs using a provided dataset, and (2) conducting
statistical analysis and inference on LLMs' outputs through regression and other statistical
methods. Empirical application of JustEva reveals significant fairness deficiencies
in current LLMs, highlighting the lack of fair and trustworthy LLM legal tools. JustEva
offers a convenient tool and methodological foundation for evaluating and improving
algorithmic fairness in the legal domain.. The toolkit is available for deployment
at https://github.com/KYSpring/ai_fairness_demo. A video demonstration of the toolkit
is available at https://drive.google.com/file/d/1lB2U3q-kI5B5frv8iqVceVaA9Yks3kE6/view?usp=sharing.
Guess the Age of Photos: An Interactive Web Platform for Historical Image Age Estimation
- Hasan Yücedag
- Adam Jatowt
This paper introduces Guess the Age of Photos, a web platform engaging users in estimating the years of historical photographs
through two gamified modes: Guess the Year (predicting a single image's year) and Timeline Challenge (comparing two images to identify the older). Built with Python, Flask, Bootstrap,
and PostgreSQL, it uses a 10,150-image subset of the Date Estimation in the Wild dataset
(1930-1999). Features like dynamic scoring and leaderboards boost engagement. Evaluated
with 113 users and 15,473 gameplays, the platform earned a 4.25/5 satisfaction rating.
Users excelled in relative comparisons (65.9% accuracy) over absolute year guesses
(25.6% accuracy), with older decades easier to identify. The platform serves as an
educational tool, fostering historical awareness and analytical skills via interactive
exploration of visual heritage. Furthermore, the platform provides a valuable resource
for studying human perception of temporal cues in images and could be used to generate
annotated data for training and evaluating computer vision models.
VoiceVisSystem: End-to-End Voice-driven Data Visualization Generation from Natural
Language Questions
- Haodi Zhang
- Xiaohui Tang
- Xinhe Zhang
- Jihua Zhou
- Yuanfeng Song
In today's digital era, data visualization (DV) technology has become indispensable
for tasks involving data processing and graphical reasoning. In this demonstration,
we introduce a novel automatic DV system named VoiceVisSystem. VoiceVisSystem is designed for transforming speech-form natural language questions
(NLQs) into visual data representations, a task formally known as Speech-to-Vis. Unlike the existing cascaded method (e.g., Sevi), the core component of our system
relies on an advanced end-to-end speech-to-vis model named SpeechVisNet, eliminating the need for text as an intermediate medium and directly facilitating
the conversion from Speech-form to DV. Specifically, the speech encoder and the text
encoder of the SpeechVisNet respectively take the user's NLQs and the corresponding
database information as inputs and convert them into hidden representations. Then,
a grammar-based decoder generates the corresponding DVs as the output. As a result,
our system enjoys the benefits of avoiding error propagation, thereby enhancing accuracy.
By offering a seamless solution for the speech-to-vis task, VoiceVisSystem presents
a promising tool for practical applications in various domains. The demonstration
video is available at https://1drv.ms/v/s!Ah2vhbolPBFMiSNPZLunJ6Qp6jqU?e=Shyq8R.
CyberBOT: Ontology-Grounded Retrieval Augmented Generation for Reliable Cybersecurity
Education
- Chengshuai Zhao
- Riccardo De Maria
- Tharindu Kumarage
- Kumar Satvik Chaudhary
- Garima Agrawal
- Yiwen Li
- Jongchan Park
- Ying-Chih Chen
- Yuli Deng
- Huan Liu
Advancements in large language models (LLMs) have enabled the development of intelligent
educational tools that support inquiry-based learning across technical domains. In
cybersecurity education, where accuracy and safety are paramount, systems must go
beyond surface-level relevance to provide information that is both trustworthy and
domain-appropriate. To address this challenge, we introduce CyberBOT. Code: https://github.com/rccrdmr/CyberBOT,
a question-answering chatbot that leverages a retrieval-augmented generation (RAG)
pipeline to incorporate contextual information from course-specific materials and
validate responses using a domain-specific cybersecurity ontology. The ontology serves
as a structured reasoning layer that constrains and verifies LLM-generated answers,
reducing the risk of misleading or unsafe guidance. CyberBOT has been deployed in
a large graduate-level course at Arizona State University (ASU). Video: https://youtu.be/X2WorBxOQHo
which illustrates a promising direction for developing reliable and curriculum-aligned
AI applications in specialized educational contexts.
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific
LLM Chatbots
- Xinjie Zhao
- Moritz Blum
- Fan Gao
- Yingjian Chen
- Boming Yang
- Luis Marquez-Carpintero
- Mónica Pina-Navarro
- Yanran Fu
- So Morikawa
- Yusuke Iwasawa
- Yutaka Matsuo
- Chanjun Park
- Irene Li
AGENTiGraph is a user-friendly, agent-driven system that enables intuitive interaction
and management of domain-specific data through the manipulation of knowledge graphs
in natural language. It gives non-technical users a complete, visual solution to incrementally
build and refine their knowledge bases, allowing multi-round dialogues and dynamic
updates without specialized query languages. The flexible design of AGENTiGraph, including
intent classification, task planning, and automatic knowledge integration, ensures
seamless reasoning between diverse tasks. Evaluated on a 3,500-query benchmark within
an educational scenario, the system outperforms strong zero-shot baselines (achieving
95.12% classification accuracy, 90.45% execution success), indicating potential scalability
to compliance-critical or multi-step queries in legal and medical domains, e.g., incorporating
new statutes or research on the fly. Our open-source demo offers a powerful new paradigm
for multi-turn enterprise knowledge management that bridges LLMs and structured graphs.
OntoLDiff: A Highly Efficient System for Tracking Logical Difference in Large-Scale
Ontologies
- Yizheng Zhao
- Renate A. Schmidt
Modern ontologies undergo continuous evolution to accommodate new domain knowledge,
correct modeling errors, and adapt to changing user requirements. Monitoring these
changes is crucial for maintaining ontology quality and understanding the semantic
impact of modifications on dependent systems and applications. This paper describes
OntoLDiff, a highly efficient system for tracking the logical difference between two
ontologies formulated in the description logic ELH. Intuitively, the logical difference between two versions of an ontology refers to
the set of axioms entailed by one version but not the other, indicating the information
gain or loss between them. Typically, such axioms, referred to as 'witnesses ', can be infinite, making logical difference computation infeasible. To address
this challenge, OntoLDiff employs a Uniform Interpolation (UI) approach to compute a finite representation of these axioms. Instead of computing
all the entailments of one ontology but not the other, which would be computationally
infeasible, the UI-based approach focuses on identifying only the strongest entailments,
from which all witnesses can in principle be computed from its deductive closure.
Despite UI's computational complexity, OntoLDiff is currently the only tool that efficiently
tracks logical differences in industrial-scale ontologies, enabling ontology curators
to precisely identify meaningful changes during ontology evolution.
AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration
and User Assistance
- Yuyang Zhao
- Wentao Shi
- Fuli Feng
- Xiangnan He
Large language model (LLM)-based agents have demonstrated remarkable capabilities
in addressing complex tasks, thereby enabling more advanced information retrieval
and supporting deeper, more sophisticated human information-seeking behaviors. However,
most existing agents operate in a purely reactive manner, responding passively to
user instructions, which significantly constrains their effectiveness and efficiency
as general-purpose platforms for information acquisition. To overcome this limitation,
this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates
multi-domain information based on user instructions. This approach enables the system
to proactively anticipate users' underlying needs and conduct in-depth multi-domain
information mining, thereby facilitating the acquisition of more comprehensive and
intelligent information. AppAgent-Pro has the potential to fundamentally redefine
information acquisition in daily life, leading to a profound impact on human society.
Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration
video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.
TrustMap: Mapping Truthfulness Stance of Social Media Posts on Factual Claims for
Geographical Analysis
- Zhengyuan Zhu
- Haiqi Zhang
- Zeyu Zhang
- Chengkai Li
Factual claims and misinformation circulate widely on social media, shaping public
opinion and decision-making. The concept of truthfulness stance refers to whether
a text affirms a claim as true, rejects it as false, or takes no clear position. Capturing
such stances is essential for understanding how the public engages with and propagates
misinformation. We present TrustMap, an application that identifies and visualizes
stances of tweets toward factual claims. Users may input factual claims or select
claims from a curated set. For each claim, TrustMap retrieves relevant social media
posts and applies a retrieval-augmented approach with fine-tuned language models to
classify stance. Posts are classified as positive, negative, or neutral/no stance.
These classifications are then aggregated by location to reveal regional variations
in public opinion. To enhance interpretability, TrustMap uses large language models
to generate stance explanations for individual posts and to produce regional stance
summaries. By integrating retrieval-augmented truthfulness stance detection with geographical
visualization, TrustMap provides the first tool of its kind for exploring how belief
in factual claims varies across regions.
SESSION: PhD Symposium
Towards Rational Pesticide Design with Graph Machine Learning Models for Ecotoxicology
This research focuses on rational pesticide design, using graph machine learning to
accelerate the development of safer, eco-friendly agrochemicals, inspired by in silico
methods in drug discovery. With an emphasis on ecotoxicology, the initial contributions
include the creation of ApisTox, the largest curated dataset on pesticide toxicity
to honey bees. We conducted a broad evaluation of machine learning (ML) models for
molecular graph classification, including molecular fingerprints, graph kernels, GNNs,
and pretrained transformers. The results show that methods successful in medicinal
chemistry often fail to generalize to agrochemicals, underscoring the need for domain-specific
models and benchmarks. Future work will focus on developing a comprehensive benchmarking
suite and designing ML models tailored to the unique challenges of pesticide discovery.
Towards Trustworthy AI: Enhancing Factuality, Bias, and Compliance in LLMs
Large Language Models (LLMs) have demonstrated impressive capabilities across natural
language tasks, yet critical concerns persist regarding their factual reliability,
societal bias, and alignment with regulatory norms. Central to addressing these challenges
is the ability to systematically extract, normalize, and rank the claims made by LLMs-whether
factual, normative, or policy-relevant. However, existing approaches often assume
that claims are self-contained within individual sentences, overlooking the reality
that many important claims emerge only through multi-sentence context. This leads
to fragmented analysis and underestimates the complexity of model behavior. Furthermore,
current methods are limited in scope, often relying on narrow domains and fixed knowledge
sources, and struggle to identify or prioritize claims with potential for social harm
or legal noncompliance. By advancing methods for context-aware claim extraction, standardization
across sensitive attributes and regulatory categories, and risk-informed ranking,
this research aims to provide a more comprehensive foundation for evaluating and auditing
LLM outputs. Such a framework is essential for building systems that are not only
factually grounded, but also fair, transparent, and compliant in high-stakes applications.
Reasoning over Incomplete Knowledge Graphs
Incomplete knowledge graphs present a fundamental challenge for reliable multi-hop
knowledge graph question answering (KGQA), causing reasoning failures when key factual
triples are missing. While large language models (LLMs) offer strong reasoning capabilities
for KGQA, they are prone to hallucinations and often assume complete knowledge graphs
(KGs). This research identifies key bottlenecks in current LLM-KGQA pipelines caused
by KG incompleteness. We propose targeted remedies, centered on integrating link prediction
tools, to enhance performance in sparse KGs. We explore two main directions: (1) improving
the robustness of LLM-KGQA methods under KG sparsity, and (2) leveraging advanced
link prediction techniques to recover missing graph connections. Preliminary experiments
on benchmark datasets demonstrate significant improvements in both answer accuracy
and link prediction performance under simulated KG sparsity. These results bridge
the gap between LLM-based reasoning and incomplete KGs, laying the foundation for
more faithful and interpretable KGQA systems.
DeepEyeNet: Generating Medical Report for Retinal Images
The increasing prevalence of retinal diseases poses a significant challenge to the
healthcare system, as the demand for ophthalmologists surpasses the available workforce.
This imbalance creates a bottleneck in diagnosis and treatment, potentially delaying
critical care. Traditional methods of generating medical reports from retinal images
rely on manual interpretation, which is time-consuming and prone to errors, further
straining ophthalmologists' limited resources. This thesis investigates the potential
of Artificial Intelligence (AI) to automate medical report generation for retinal
images. AI can quickly analyze large volumes of image data, identifying subtle patterns
essential for accurate diagnosis. By automating this process, AI systems can greatly
enhance the efficiency of retinal disease diagnosis, reducing doctors' workloads and
enabling them to focus on more complex cases. The proposed AI-based methods address
key challenges in automated report generation: (1) A multi-modal deep learning approach
captures interactions between textual keywords and retinal images, resulting in more
comprehensive medical reports; (2) Improved methods for medical keyword representation
enhance the system's ability to capture nuances in medical terminology; (3) Strategies
to overcome RNN-based models' limitations, particularly in capturing long-range dependencies
within medical descriptions; (4) Techniques to enhance the interpretability of the
AI-based report generation system, fostering trust and acceptance in clinical practice.
These methods are rigorously evaluated using various metrics and achieve state-of-the-art
performance. This thesis demonstrates AI's potential to revolutionize retinal disease
diagnosis by automating medical report generation, ultimately improving clinical efficiency,
diagnostic accuracy, and patient care. DeepEyeNet Project Github: https://github.com/Jhhuangkay/DeepOpht-Medical-Report-Generation-for-Retinal-Images-via-Deep-Models-and-Visual-Explanation
Graph Neural Network Architecture Search via Hybrid Genetic Algorithm with Parallel
Tempering
In recent years, there has been a surge of interest in harnessing graph neural networks
(GNNs) for graph classification tasks. Despite the strong performance of manually
designed GNN architectures, their development often relies on time-consuming, expert-driven
trial and error, which may overlook promising design opportunities. To address this
challenge, we propose a hybrid Genetic Algorithm and Parallel Tempering (GA+PT) framework
for automated GNN architecture search. Our method systematically encodes a rich design
space-including convolutional layer types (GCN, GraphSAGE, GAT, GIN, SGC), attention
heads, hidden-layer widths, dropout rates, weight-initialization schemes, learning
rates, and classifier structures-into evolutionary genotypes. A population-based GA
explores this space via crossover and mutation, while an inner PT Markov Chain Monte
Carlo procedure adaptively refines individual solutions under a temperature schedule
to escape local optima. We validate our approach on benchmark graph datasets, demonstrating
its ability to discover high-performing architectures that balance classification
accuracy, macro-F1 score, and model complexity. The proposed GA+PT hybrid offers a
robust, scalable, and resource-aware alternative to manual GNN design and purely gradient-based
neural architecture search methods.
Real until proven fake - Source-Level Audio Deepfake Detection (with PIPNet)
The rapid development of synthetic speech technologies poses significant challenges
to digital security and authenticity verification. This work investigates the use
of prototype-based neural networks for detecting and classifying audio deepfakes by
tracing them back to their generative source. Using time-frequency representations
of speech (spectrograms, mel-spectrograms, MFCC), we evaluated model performance across
multilingual and monolingual setups. Our results demonstrate that PIPNet reliably
distinguishes between real and synthetic speech and effectively identifies the source
TTS generator, making it a strong candidate for source-level attribution in audio
deepfake detection.
Investigating the Usage and Evaluation of Quantum Computing Technologies for Information
Access
Quantum Computing (QC) is an emerging research field that is attracting significant
interest from the scientific community. In fact, it is believed that quantum computers
can be employed to solve complex computational problems more efficiently than traditional
computers, due to their inherent capabilities of exploring large search spaces very
efficiently by leveraging the principles of quantum mechanics, such as superposition,
entanglement, and tunnelling. However, quantum computers are still in their early
stages of development and their applications are very limited, especially in the field
of Information Access (IA). Nevertheless, IA systems often face complex optimization
problems that might be solved more efficiently through the usage of quantum computers.
This paper outlines the author's PhD objectives in designing new methodologies for
the application and evaluation of QC technologies for IA problems. Furthermore, this
work provides an overview of the achieved preliminary results and a discussion of
possible future research directions.
The Landscape of Foundation Models for Molecular Chemistry
Pre-trained neural networks have recently emerged as powerful tools for molecular
data mining, offering an alternative to classical approaches. However, these models
are often evaluated on limited datasets with narrow baselines, leaving their benefits
unclear. We present the first large-scale benchmark comparing pre-trained molecular
embedding models across 20 public datasets spanning classification and regression
tasks. Our evaluation covers text-based, graph-based, and multimodal architectures,
all tested under a unified methodology. The results show that the classical fingerprint-based
models remain highly competitive. Only a few models consistently exceeded the baseline.
We also highlight key factors influencing model performance, offering practical guidance
for model selection and future improvements in molecular embeddings.
Eliminating Bias from Presentation Attack Detection Algorithms for Face Recognition
Systems
This paper presents an analysis of the fairness of presentation attack detection algorithms
in face recognition systems. The study is the initial part of my Ph.D. research, with
the aim of developing robust and unbiased PAD methodologies. One of the main aspects
of this work involves the manual annotation of demographic groups within used biometric
datasets, as tested automated tools do not offer sufficient accuracy for such labeling
tasks. Another part of this work was the evaluation of two open-source PAD algorithms.
These experiments reveal performance disparities across different demographic groups,
highlighting the presence of algorithmic bias. Future stages of my research will explore
the impact of sensor characteristics, specifically near-infrared (NIR) imaging, on
demographic bias, with the ultimate goal of designing a bias-resilient PAD algorithm.
Explainable Numerical Claim Verification
The rapid proliferation of mis- and disinformation in the digital age highlights the
urgent need for scalable, transparent, and trustworthy automated fact-checking systems.
Large Language Models (LLMs) offer strong language understanding capabilities but
suffer from opacity and brittleness, particularly in reasoning over numerical claims.
This work explores how Explainable Artificial Intelligence (XAI)-through the lens
of counterfactual explanations and adversarial training-can be used to systematically
evaluate and improve the robustness of LLMs against perturbed numerical inputs. We
propose a framework that employs counterfactual generation to both probe LLM reliability
and generate user-appropriate explanations. Through empirical evaluations using a
large-scale numerical fact-checking dataset (QuanTemp), we show that even state-of-the-art
LLMs are susceptible to subtle numerical perturbations, impacting verdict accuracy.
Our methodology contributes a dual-purpose diagnostic and training strategy that not
only bolsters robustness but also enables both global and local interpretability-thereby
improving explainability in automated fact-checking systems.
Toward Robust Machine Learning under Diverse Incomplete Data Mechanisms in Real-World
Applications
Incomplete data is a pervasive challenge across a wide range of data types, including
tabular, sensor, time-series, image, and textual data. Its presence stems from various
real-world factors and gives rise to different missingness mechanisms. While much
of the existing research focuses on the Missing Completely At Random (MCAR) assumption,
the more complex and realistic mechanisms-Missing At Random (MAR) and Missing Not
At Random (MNAR)-remain relatively underexplored despite their prevalence and impact.
This PhD project aims to systematically investigate the challenges posed by diverse
Incomplete data mechanisms and to develop robust machine learning methods that can
perform reliably across MCAR, MAR, and MNAR scenarios. The research spans multiple
data modalities and focuses on improving both the theoretical understanding and practical
handling of incomplete data. By addressing mechanism-specific imputation challenges
and proposing broadly applicable solutions, this work contributes to building more
resilient and trustworthy data-driven systems in real-world settings.
SESSION: Tutorials
Towards Large Generative Recommendation: A Tokenization Perspective
- Yupeng Hou
- An Zhang
- Leheng Sheng
- Jiancan Wu
- Xiang Wang
- Tat-Seng Chua
- Julian McAuley
The emergence of large generative models is transforming the landscape of recommender
systems. One of the most fundamental components in building these models is action
tokenization, the process of converting human-readable data (e.g., user-item interactions) into machine-readable formats (e.g., discrete token sequences). In this tutorial, we present a comprehensive overview
of existing action tokenization techniques, converting actions to (1) item IDs, (2)
textual descriptions, and (3) semantic IDs. We then make an in-depth discussion on
the challenges and open questions of building large generative recommendation models
from the perspective of action tokenization. Materials of this tutorial are available
at: https://large-genrec.github.io/.
Socially Responsible and Trustworthy Generative Foundation Models: Principles, Challenges,
and Practices
- Yue Huang
- Canyu Chen
- Lu Cheng
- Bhavya Kailkhura
- Nitesh Chawla
- Xiangliang Zhang
Generative foundation models (GenFMs), including large language and multimodal models,
are transforming information retrieval and knowledge management. However, their rapid
adoption raises urgent concerns about social responsibility, trustworthiness, and
governance. This tutorial offers a comprehensive, hands-on overview of recent advances
in responsible GenFMs, covering foundational concepts, multi-dimensional risk taxonomies
(including safety, privacy, robustness, truthfulness, fairness, and machine ethics),
state-of-the-art evaluation benchmarks, and effective mitigation strategies. We integrate
real-world case studies and practical exercises using open-source tools, and present
key perspectives from both policy and industry, including recent regulatory developments
and enterprise practices. The session concludes with a discussion of open challenges,
providing actionable guidance for the CIKM community.
A Tutorial on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide
- Sunwoo Kim
- Soo Yong Lee
- Yue Gao
- Alessia Antelmi
- Mirko Polato
- Kijung Shin
Higher-order interactions (HOIs) are ubiquitous in real-world networks, such as group
discussions on online Q&A platforms, co-purchases of items in e-commerce, and collaborations
of researchers. Investigation of deep learning for networks of HOIs, expressed as
hypergraphs, has become an important agenda for the data mining and machine learning
communities. As a result, hypergraph neural networks (HNNs) have emerged as a powerful
tool for representation learning on hypergraphs. Given this emerging trend, we provide
a timely tutorial dedicated to HNNs. We cover the following topics: (1) inputs, (2)
message passing schemes, (3) training strategies, (4) applications (e.g., recommender
systems and time series analysis), and (5) open problems of HNNs. This tutorial is
intended for researchers and practitioners who are interested in hypergraph representation
learning and its applications.
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
- Dawei Li
- Yue Huang
- Ming Li
- Tianyi Zhou
- Xiangliang Zhang
- Huan Liu
Generative models such as Large Language Models, Diffusion Models, and generative
adversarial networks have recently revolutionized the creation of synthetic data,
offering scalable solutions to data scarcity, privacy, and annotation challenges in
data mining. This tutorial introduces the foundations and latest advances in synthetic
data generation, covers key methodologies and practical frameworks, and discusses
evaluation strategies and applications. Attendees will gain actionable insights into
leveraging generative synthetic data to enhance data mining research and practice.
More information can be found on our website: https://syndata4dm.github.io/.
Neural Differential Equations for Continuous-Time Analysis
- Yongkyung Oh
- Dongyoung Lim
- Sungil Kim
Modeling complex, irregular time series is a critical challenge in knowledge discovery
and data mining. This tutorial introduces Neural Differential Equations (NDEs)--a
powerful paradigm for continuous-time deep learning that intrinsically handles the
non-uniform sampling and missing values where traditional models falter. We provide
a comprehensive review of the theory and practical application of the entire NDE family:
Neural Ordinary (NODEs), Controlled (NCDEs), and Stochastic (NSDEs) Differential Equations.
The tutorial emphasizes robustness and stability and culminates in a hands-on session
where participants will use key open-source libraries to solve real-world tasks like
interpolation and classification. Designed for AI researchers and practitioners, this
tutorial equips attendees with essential tools for time series analysis.
Retrieval of Graph Structured Objects: Theory and Applications
- Indradyumna Roy
- Soumen Chakrabarti
- Abir De
Graph-structured data is ubiquitous across diverse domains like social networks, search,
question answering, and drug discovery. Effective retrieval of (sub-)graphs with relevant
substructures has become critical to the success of these applications. This proposed
tutorial will introduce attendees to state-of-the-art neural methods for graph retrieval,
highlighting architectures that effectively model relevance through innovative combinations
of early and late interaction mechanisms.
Participants will explore relevance models that represent graphs as sets of embeddings,
enabling alignment-driven similarity scoring between query and corpus graphs and supporting
diverse cost functions, both symmetric and asymmetric. We will also discuss compatibility
with Approximate Nearest Neighbor (ANN) methods, covering recent advances in locality-sensitive
hashing (LSH) and other indexing techniques that significantly enhance scalability
in graph retrieval.
The tutorial includes hands-on experience with an accessible, PyTorch-integrated toolkit
that provides downloadable graph retrieval datasets and baseline implementations of
recent methods. Participants will learn to adapt these methods for multi-modal applications
--- such as molecule, text, and image retrieval --- where graph-based retrieval proves
particularly effective. Designed for researchers and practitioners, this session delivers
both foundational concepts and practical tools for implementing and scaling neural
graph retrieval solutions across interdisciplinary applications.
Neural Shifts in Collaborative Team Recommendation
- Mahdis Saeedi
- Hossein Fani
Team recommendation involves selecting skilled experts to form an almost surely successful collaborative team, or refining the team composition to maintain or excel
at performance. To eschew the tedious and error-prone manual process, various computational
and social science theoretical approaches have been proposed wherein the problem definition
remains essentially the same, while it has been referred to by such other names as
team allocation, selection, composition, and formation. In this tutorial, we study
the advancement of computational approaches from greedy search in pioneering works
to the recent learning-based approaches, with a particular in-depth exploration of
graph neural network-based methods as the cutting-edge class, via unifying definitions,
formulations, and evaluation schema. More importantly, we then discuss team refinement, a subproblem in team recommendation that involves structural adjustments or expert
replacements to enhance team performance in dynamic environments. Finally, we introduce
training strategies, benchmarking datasets, and open-source tools, along with future
research directions and real-world applications. The tutorial artifacts can be found
at https://fani-lab.github.io/OpeNTF/tutorial/cikm25.
Fairness in Language Models: A Tutorial
- Zichong Wang
- Avash Palikhe
- Zhipeng Yin
- Wenbin Zhang
Language Models (LMs) achieve outstanding performance across diverse applications
but often produce biased outcomes, raising concerns about their trustworthy deployment.
These concerns call for fairness research specific to LMs; however, most existing
work in machine learning assumes access to model internals or training data, conditions
that rarely hold in practice. As LMs continue to exert growing societal influence,
it becomes increasingly important to understand and address fairness challenges unique
to these models. To this end, our tutorial begins by showcasing real-world examples
of bias to highlight their practical implications and uncover underlying sources.
We then define fairness concepts tailored to LMs, review methods for bias evaluation
and mitigation, and present a multi-dimensional taxonomy of benchmark datasets for
fairness assessment. We conclude by outlining open research challenges, aiming to
provide the community with both conceptual clarity and practical tools for fostering
fairness in LMs. All tutorial resources are publicly accessible at https://github.com/vanbanTruong/fairness-in-large-language-models.
Uncertain Boundaries: A Tutorial on Copyright Challenges and Cross-Disciplinary Solutions
for Generative AI
- Zhipeng Yin
- Zichong Wang
- Avash Palikhe
- Wenbin Zhang
As generative artificial intelligence (AI) becomes increasingly prevalent in creative
industries, intellectual property issues have come to the forefront, especially regarding
AI-generated content that closely resembles human-created works. Recent high-profile
incidents involving AI-generated outputs reproducing copyrighted materials underscore
the urgent need to reassess current copyright frameworks and establish effective safeguards
against infringement. To this end, this tutorial provides a structured overview of
copyright challenges in generative AI across the entire development lifecycle. It
begins by outlining key copyright principles relevant to generative models, then explores
methods for detecting and evaluating potential infringement in generated outputs.
The session also introduces strategies to safeguard creative content and training
data from unauthorized replication, including mitigation techniques during model training.
Finally, it reviews existing regulatory frameworks, highlights unresolved research
questions, and offers recommendations to guide future work in this evolving area.
Continual Recommender Systems
- Hyunsik Yoo
- Seongku Kang
- Hanghang Tong
Modern recommender systems operate in uniquely dynamic settings: user interests, item
pools, and popularity trends shift continuously, and models must adapt in real time
without forgetting past preferences. While existing tutorials on continual or lifelong
learning cover broad machine learning domains (e.g., vision and graphs), they do not address recommendation-specific demands-such as
balancing stability and plasticity per user, handling cold-start items, and optimizing
recommendation metrics under streaming feedback. This tutorial aims to make a timely
contribution by filling that gap. We begin by reviewing the background and problem
settings, followed by a comprehensive overview of existing approaches. We then highlight
recent efforts to apply continual learning to practical deployment environments, such
as resource-constrained systems and sequential interaction settings. Finally, we discuss
open challenges and future research directions. We expect this tutorial to benefit
researchers and practitioners in recommender systems, data mining, AI, and information
retrieval across academia and industry.
SESSION: Industry Day Talks
Autoregressive Generative Retrieval for Industrial-Scale Recommendations at Pinterest
- Prabhat Agarwal
- Anirudhan Badrinath
- Laksh Bhasin
- Jaewon Yang
- Jiajing Xu
- Charles Rosenberg
Generative retrieval methods utilize generative sequential modeling techniques, such
as transformers, to generate candidate items for recommender systems. These methods
have demonstrated promising results in academic benchmarks, surpassing traditional
retrieval models like two-tower architectures. However, current generative retrieval
methods lack the scalability required for industrial recommender systems, and they
are insufficiently flexible to satisfy the multiple metric requirements of modern
systems. This talk introduces PinRec, a novel generative retrieval model developed
for applications at Pinterest. PinRec utilizes outcome-conditioned generation, enabling
modelers to specify how to balance various outcome metrics, such as the number of
saves and clicks, to effectively align with business goals and user exploration. Additionally,
PinRec incorporates multi-token generation to enhance output diversity while optimizing
generation. Our experiments demonstrate that PinRec can successfully balance performance,
diversity, and efficiency, delivering a significant positive impact to users using
generative models. This talk presents the first in-depth study on productionizing
generative retrieval. Our experiments demonstrate that PinRec can successfully balance
performance, diversity, and efficiency, delivering a significant positive impact such
as +2% sitewide clicks and +4% search repins. This paper marks a significant milestone
in generative retrieval, as it presents, to our knowledge, the first rigorous study
on implementing generative retrieval at the scale of Pinterest.
Building Trustworthy Peer Review Quality Assessment Systems
- Negar Arabzadeh
- Sajad Ebrahimi
- Ali Ghorbanpour
- Soroush Sadeghian
- Sara Salamat
- Muhan Li
- Hai Son Le
- Mahdi Bashari
- Ebrahim Bagheri
Peer review is foundational to academic publishing, yet the quality of reviews remains
difficult to assess at scale due to subjectivity, inconsistency, and the lack of standardized
evaluation mechanisms. This talk presents our experience developing and deploying
a scalable framework for assessing review quality in operational settings. We combine
two complementary approaches: interpretable machine learning models built on quantifiable
review- and reviewer-level features, and the application of large language models
(LLMs), including Qwen, Phi, and GPT-4o, in zero- and few-shot configurations for
textual quality evaluation. We also explore the fine-tuning of LLMs on expert-annotated
datasets to examine their upper-bound capabilities. To benchmark these methods, we
constructed a dataset of over 700 paper-review pairs labeled by domain experts across
multiple quality dimensions. Our findings demonstrate that transparent, feature-based
models consistently outperform LLMs in reliability and generalization, particularly
when evaluating conceptual depth and argumentative structure. The talk will highlight
key engineering choices, deployment challenges, and broader implications for integrating
automated review evaluation into scholarly workflows.
Safeguarding Generative AI Applications in Preclinical Imaging through Hybrid Anomaly
Detection
- Jakub Binda
- Valentina Paneta
- Vasileios Eleftheriadis
- Hongkyou Chung
- Panagiotis Papadimitroulas
- Neo Christopher Chung
Generative artificial intelligence (GenAI) holds great potentials to automate and
enhance data synthesis in nuclear medicine. However, the high-stakes nature of biomedical
imaging necessitates robust mechanisms to detect and manage unexpected or erroneous
model behavior. We report development and implementation of a hybrid anomaly detection
framework to safeguard GenAI models in BIOEMTECH's eyes™ systems. Two applications are demonstrated: Pose2Xray, which generates synthetic
X-rays from photographic mouse images, and DosimetrEYE, which estimates 3D radiation
dose maps from 2D SPECT/CT scans. In both cases, our outlier detection (OD) enhances
reliability, reduces manual oversight, and supports real-time quality control. This
approach strengthens the industrial viability of GenAI in preclinical settings by
increasing robustness, scalability, and regulatory compliance.
Motion-Based Bird-UAV Classification Using 3D-CNN for Long-Range Anti-UAV Systems
- Woo-Choel Jin
- Daegun Oh
- Sang-Chul Lee
- Ji-Woong Choi
The increasing threat of malicious unmanned aerial vehicles (UAVs) necessitates robust
anti-UAV systems. However, their performance is often degraded by bird misclassification
caused by low-resolution imagery and unseen UAV types. This study proposes a motion-based
3D convolutional neural network (3D-CNN) trained on image sequences acquired from
a radar-camera integrated anti-UAV solution. The proposed method effectively distinguishes
UAVs from birds, even under low-resolution conditions and when encountering previously
unseen UAV types.
Taming the Unicorn: Turning Generative AI Into a Workhorse How to Draw Boundaries,
Handle Hallucinations, and Make AI Behave in Product
- Marios Kokkodis
- Purusoth Mahendran
- Grace Boatwright
This work illustrates the process of turning speculative AI prototypes into reliable,
production-grade systems. In particular, we discuss how we redesigned our search engine
from keyword-based matching to a generative AI framework that interprets natural language
queries and engages users through dynamic question answering, to provide a personalized
experience. To build this framework, we created highly cross-functional processes,
where content design, user experience research, engineering, and science worked iteratively
in unison. In short, we do not chase unicorns; we put them to work safely and predictably.
ROI Scan: LLM-powered Object-level Similarity Search for Google Ads Content Moderation
- Enming Luo
- Yintao Liu
- Dongjin Kwon
- Rich Munoz
- Wei Qiao
- Nic Trieu
- Eric Xiao
- Jimin Li
- Laurel Graham
- Ariel Fuxman
Google's ad platform reviews a massive volume of ads daily, aiming to maintain user
trust and platform integrity. However, malicious advertisers often bypass the existing
detection by subtly manipulating suspended ads, typically by placing a policy-violating
region of interest (ROI) into diverse images with varied backgrounds and sizes. To
counter this, we introduce ROI Scan, a novel two-stage object-level similarity search
approach. Stage 1 precisely identifies and extracts the problematic ROIs from escalated
ad images using LLM-powered suggestions, refined by human expert review. Stage 2 then
matches these ROIs against a massive database of billions of objects extracted from
production images. Our experiments on the Online Gambling policy demonstrate ROI Scan's
effectiveness, achieving 89.1% relative recall and 83.8% incremental coverage over
the full-image search baseline, with nearly 100% precision. In production, ROI Scan
prevents hundreds of millions of policy violation impressions and blocks hundreds
of thousands of bad ads weekly.
Using Large Language Models to Improve Product Information in E-commerce Catalogs
- Gang Luo
- Julien Han
- Hayreddin Ceker
- Karim Bouyarmane
To give customers good experience, an e-commerce retailer needs high-quality product
information in its catalog. Yet, the raw product information often lacks sufficient
quality. For a large catalog that can contain billions of products, manually fixing
this information is highly labor-intensive. To address this issue, we propose using
the tool use functionality of large language models to automatically improve product
information. In this talk, we show why existing data cleaning methods are not well
suited for this task and how we designed our automated system to improve product information.
When evaluated on a random sample of products from an e-commerce catalog, our system
improved product information completeness by 78% with no major drop in information
accuracy.
Semantic Filter Recommendation for eCommerce Search
- Kilian Merkelbach
- Antonino Freno
We present an application of encoder-only Transformers to the task of recommending
filters to eCommerce search queries. In particular, we operate in a dynamic setup
where new recommendations are computed online whenever the user selects one or more
filters, conditioned on the search query and the filters selected so far. Our method
leverages the world knowledge imparted into a pretrained model, setting it apart from
purely memory-based or statistical models, which we use as baselines for evaluation.
We review experimental results on offline benchmarks using data generated from eBay
search logs, comparing the performance of the proposed model to the baselines. The
results show a significant increase in filter recommendation accuracy, as measured
by NDCG.
LLM-Driven Attributes Extraction in eCommerce
- Ksenia Riabinova
- Kilian Merkelbach
Aspect extraction - the task of identifying attributes such as model, color, or size
from textual entities like question-answer pairs - is one of the key tasks in eCommerce.
Given the large number of possible aspects per entity, retrieving the most relevant
ones is challenging. Traditional methods rely on high-quality labeled data for training,
which is costly to obtain at scale. In this work, we propose a training-free aspect
extraction approach using LLMs. Leveraging in-context learning and a novel Forward-Backward
method that combines retrieval-augmented generation (RAG) with embedding-based matching,
our method effectively extracts relevant aspects from text without requiring training
data.
Google Ads Content Moderation with RAG
- Yuan Wang
- Wei Qiao
- Jingxiang Li
- Tiantian Fang
- Eric Xiao
- Megan Oftelie
- Zhimin Wang
- Yintao Liu
- Jimin Li
- Yi-Ting Chen
- Zhongli Ding
- Enming Luo
Keeping ad content policy classifiers up to date while maintaining the high quality
bar is a significant challenge, especially with new threats emerging constantly. This
paper introduces a new application to apply RAG-inspired in-context learning to accelerate
content policy enforcement, especially when mitigating new emerging violations. Our
application leverages RAG-based LLM inference for classification tasks and incorporates
augmented reasoning information for better performance. We also developed a practical
framework to enforce new violation patterns in O(1) days demonstrating improved memorization
and generalization capabilities compared to traditional parametric and non-parametric
models.
TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
- Xue Xia
- Saurabh Joshi
- Kousik Rajesh
- Kangnan Li
- Yangyi Lu
- Nikil Pancha
- Dhruvil Badani
- Jiajing Xu
- Pong Eksombatchai
Modeling user action sequences has become a popular focus in industrial recommendation
system research, particularly for Click-Through Rate (CTR) prediction tasks. However,
industry-scale CTR models often rely on short user sequences, limiting their ability
to capture long-term behavior. They also rarely address the infrastructure challenges
involved in efficiently serving large-scale sequential models. Additionally, these
models typically lack an integrated action-prediction task within a point-wise ranking
framework, reducing their predictive power. We introduce TransAct V2, a production
model for Pinterest's Homefeed ranking system, featuring three key innovations: (1)
leveraging very long user sequences to improve CTR predictions, (2) employing scalable,
low-latency deployment solutions tailored to handle the computational demands of extended
user action sequences, and (3) integrating a Next Action Loss function for enhanced
user action forecasting. To overcome latency and storage constraints, we leverage
efficient data-processing strategies and model-serving optimizations, enabling seamless
industrial-scale deployment. Our approach's effectiveness is further demonstrated
through ablation studies. Furthermore, extensive offline and online A/B experiments
confirm major gains in key metrics, including engagement volume and recommendation
diversity, showcasing TransAct V2's real-world impact.
AutoRuleSQL: Hybrid Text-to-SQL via Rule-Driven Fast Paths and LLM Bootstrapping
- Han Xu
- Yang Li
- Yanhai Xiong
- Robert Mintern
- Amir Louka
- Haipeng Chen
Natural Language to SQL (NL2SQL) enables natural language access to structured data,
but LLM-based methods can be inefficient for real-time use and repetitive query patterns.
We present AutoRuleSQL, a hybrid system that combines template-based fast paths with
LLM fallback and offline bootstrapping. Empirical results show that it reduces latency
by over 12.6% and improves execution accuracy by up to 4.0%, when combined with existing
NL2SQL methods.
Reliable and Efficient Container Orchestration of LLMs via MCP
- Han Xu
- Xingyuan Wang
- Shihao Ji
This paper presents a structured decoding approach to support reliable and efficient
container orchestration using large language models (LLMs) in conjunction with the
Model Context Protocol (MCP), a standard interface for LLMs to interact with Docker
and Kubernetes. We address key challenges in using LLMs for container orchestration:
high token overhead from outputs and the risk of generating invalid or unsafe commands.
Empirical results demonstrate up to a 76.2% latency reduction.
SESSION: Workshops
DESERE: The 2nd Workshop on Decentralized Search and Recommendation
- Thanassis Tiropanis
- George Roussos
- Mohammad Bahrani
- Mohamed Ragab
The growing demand for data ownership and privacy is reshaping how information is
accessed, managed, integrated, and recommended. Building on the inaugural DESERE workshop
at The Web Conference 2024, this second edition advances research on Decentralised
Search and Recommendation platforms such as Personal Online Datastores (PODs), where
users retain control of their data and explicitly manage permissions. As ecosystems
decentralise, traditional information retrieval must be revisited while standards
for new techniques and system designs are developed to ensure efficient, accurate,
and privacy-preserving search. The Second DESERE workshop at CIKM 2025 focuses on
infrastructures and retrieval algorithms for user-controlled data. It convenes a cross-disciplinary
community spanning data retrieval, management and integration, semantic technologies,
recommendation systems, privacy-aware computing, and search efficiency to explore
approaches that prioritize user agency, data ownership, and scalable retrieval across
PODs and related architectures. Through paper presentations, panels, and interactive
sessions, the workshop will highlight challenges, opportunities, and solutions for
privacy-preserving IR. These discussions are especially relevant to domains where
user-centric design and data stewardship are critical-such as personal finance, education,
and high-stakes areas like criminal justice and health.
International Workshop on Multimodal Generative Search and Recommendation (MMGenSR@CIKM
2025)
- Yi Bin
- Haoxuan Li
- Haokai Ma
- Yang Zhang
- Wang Wenjie
- Yunshan Ma
- Yang Yang
- Tat-Seng Chua
Recent breakthroughs in generative Artificial Intelligence (AI) have ignited a revolutionary
wave across information retrieval and recommender systems. This workshop serves as
a premier interdisciplinary platform to explore how generative models, particularly
Large Language Models (LLMs) and Large Multimodal Models (LMMs), are transforming
multimodal search and recommendation paradigms [3, 6, 9, 10, 12-14]. We aim to convene
researchers and practitioners to discuss innovative architectures, methodologies,
and evaluation strategies spanning generative document retrieval [5, 8] generative
image retrieval [ 7, 16], grounded answer generation [17], generative recommendation
[2, 4, 11], and related tasks involving multiple modalities [1,15]. The workshop will
facilitate discussions on improving algorithms, generating personalized content, evolving
user-system interactions, enhancing trustworthiness, and refining evaluation methodologies
for these cutting-edge systems. This timely workshop seeks to identify promising research
directions, address key challenges, and foster collaborations towards the development
of next-generation intelligent systems.
ProActLLM: Proactive Conversational Information Seeking with Large Language Models
- Shubham Chatterjee
- Xi Wang
- Shuo Zhang
- Sajad Ebrahimi
- Zhaochun Ren
- Debasis Ganguly
- Gareth Jones
- Emine Yilmaz
- Hamed Zamani
Large Language Models (LLMs) have transformed information access by enabling human-like
text understanding and generation. This workshop explores the next step for conversational
AI: building proactive information-seeking assistants that go beyond reactive question
answering. We aim to investigate how LLMs can anticipate user needs, model complex
context, support mixed-initiative interactions, integrate retrieval and external tools,
personalize responses, adapt through feedback, and ensure fairness, transparency,
and cognitive grounding. Bringing together experts from NLP, IR, HCI, and cognitive
science, the workshop will serve as a timely forum for advancing intelligent, proactive
dialogue systems. It will also foster interdisciplinary collaboration.
Human-Centric AI: From Explainability and Trustworthiness to Actionable Ethics
- Jaesik Choi
- Bohyung Han
- Myoung-Wan Koo
- Kyungman Bae
- Chang D. Yoo
- Simon S Woo
- Wojciech Samek
To address the potential risks of AI while supporting innovation and ensuring responsible
adoption, there is an urgent need for clear governance frameworks grounded in human-centric
values. It is imperative that AI systems operate in ways that are transparent, trustworthy,
and ethically sound. Developing truly human-centric AI goes beyond technical innovation.
It requires interdisciplinary collaboration and diverse perspectives. This workshop
will explore key challenges and emerging solutions in the development of human-centric
AI, with a focus on explainability, trustworthiness, fairness, and privacy. We welcome
both theoretical contributions and practical case studies that demonstrate how human-centered
principles are realized in real-world AI systems. The official workshop webpage is
available at https://xai.kaist.ac.kr/Workshop/hcai2025/, which provides comprehensive
information about the program.
Advances in Medical Knowledge Systems: LLMs, RAG and Foundation Models
- Giulia Di Teodoro
- Valerio Guarrasi
- Federico Siciliano
- Fabrizio Silvestri
This workshop will explore the latest approaches to medical knowledge systems, with
a focus on the synergy between large language models, retrieval-augmented generation,
and foundation/agentic models. The workshop will promote interdisciplinary collaboration
among researchers, practitioners, and clinicians to advance evidence-driven AI in
healthcare. Topics will include knowledge-grounded question answering, biomedical
document retrieval, multimodal clinical reasoning, personalization, safety, and the
challenges of deploying AI in practice. With a strong emphasis on reproducibility,
evaluation, and responsible application in clinical settings, the workshop will define
the next frontier of knowledge-centric AI in medicine.
SIoTEc 2025 - 6th edition of ACM Workshop on Secure IoT, Edge and Cloud systems
- Antonino Galletta
- Javid Taheri
- Giuseppe Di Modica
- Annamaria Ficara
In the last years, we have seen an increase in the number of Artificial Intelligence
(AI)-powered applications for information retrieval and data science. This fact led
to an increasing reliance on distributed computing infrastructures, including Cloud,
Edge, and IoT environments. These architectures enable powerful and scalable solutions
but also introduce new security and privacy risks that must be addressed at both the
system and data levels. Even a single breach on any of the links of the data-service-infrastructure
chain may seriously compromise the security of the end-user application. With such
a wide attack surface, security must definitely be approached in a holistic way and
addressed in any layer where concerns may potentially arise. SIoTEC solicits novel
and innovative ideas, proposals, positions and best practices that address the modelling,
design, implementation, and enforcement of security in Cloud/Edge/IoT environments.
Workshop website: https://siotec.netsons.org/
The 1st International Workshop on Retrieval-driven Generative AI & ScienceON AI Challenge:
RDGENAI 2025
- Eunhui Kim
- Caren Caren Han
- KeyongTae Lim
- Yesim Selcuk
- Moonki Back
- SangJun Han
- KyongHa Lee
Retrieval augmented generation (RAG) has rapidly emerged as a cornerstone for building
trustworthy and efficient generative AI systems, spanning from unimodal question answering
to complex multimodal reasoning. The 1st international workshop on retrieval-driven
generative AI gathers researchers and practitioners focused on the applied side of
RAG-particularly for visually rich document understanding (VRDU) where text, layout,
and images intertwine. New in 2025, this workshop is co located with the ScienceON
AI Challenge, an open competition benchmarking the reliability of AI generated summaries
over ScienceON's public OpenAPI search results. By combining a research workshop with
a hands on challenge, we provide a complete pipeline from algorithms to real world
evaluation. The half day event features invited keynotes, peer reviewed papers, challenge
finalist talks, and a poster/demo session, catalyzing collaboration toward robust,
explainable, and domain adapted generativeAI.
Advances in Financial AI: Innovations, Risk, and Responsibility in the Era of LLMs
- Yongjae Lee
- Nazanin Mehrasa
- Chanyeol Choi
- Chung-Chi Chen
- Dhagash Mehta
- Stefan Zohren
- Yoon Kim
- Chulheum Lee
- Yeonhee Lee
- Eunsook Oh
The finance sector is seeing a rapid increase in the application of machine learning
and AI, with Large Language Models (LLMs), ESG (Environmental, Social, and Governance)
investing, and AI Safety significantly reshaping the field. This workshop focuses
on how these advancements intersect with core financial AI applications. We will foster
interdisciplinary discussion on applying LLMs to finance, addressing challenges in
multilingual and non-English markets like Korea. The event will also highlight the
integration of ESG signals into algorithmic decision-making and explore AI Safety,
emphasizing reliability, fairness, and explainability for AI systems in regulated
financial environments. By bringing together experts from academia, industry, and
regulatory bodies, the workshop aims to stimulate discussions on practical issues,
ethical dilemmas, and cutting-edge research shaping financial AI's future. We welcome
submissions that combine technical rigor with societal relevance in AI-driven financial
decisions.
SmaLLEXT: 1st Workshop on Small and Efficient Large Language Models for Knowledge
Extraction
- Felice Antonio Merra
- Kristian Skračić
- Daniele Malitesta
- Jacek Golebiowski
- Pasquale Minervini
The SmaLLEXT workshop (Small and Efficient LLMs for Knowledge Extraction) brings together
researchers and practitioners working on compact models that can run under tight memory-
and latency budgets while still delivering state-of-the-art accuracy in extracting
structured knowledge from unstructured sources. Today's production pipelines-in finance,
healthcare, legal, and web-scale analytics-need extraction systems that are fast,
verifiable, and economical; current 10-B+-parameter models remain out of reach for
many such settings.
This workshop focuses on techniques for building smaller, faster, and more adaptable
LLMs tailored to knowledge extraction (KE) tasks. Core topics include model compression,
knowledge distillation, efficient fine-tuning, optimized retrieval-augmented generation
(RAG), and hybrid symbolic-neural approaches. Special emphasis will be placed on practical
challenges, such as reducing latency, ensuring factual consistency, and improving
robustness in noisy or low-resource settings. By convening a diverse group of experts,
the workshop aims to deepen the community's understanding of how lightweight LLMs
can be effectively applied to extract, structure, and reason over knowledge at scale.
Participants will gain exposure to recent breakthroughs, real-world deployments, and
emerging research directions, fostering collaboration around the building of deployable
high-impact KE systems in both academic and industrial contexts.
The International Workshop on Spatio-Temporal Data Intelligence and Foundation Models
- Hao Miao
- Yan Zhao
- Yuxuan Liang
- Bin Yang
- Kai Zheng
- Christian S. Jensen
Spatio-temporal data intelligence, which includes sensing, managing, and mining large-scale
data across space and time, plays a pivotal role in understanding complex systems
in real-world applications, such as urban computing and smart cities. With the rapid
evolution of foundation models and their growing potential to transform spatio-temporal
analytics, we propose a comprehensive half-day workshop (with at least 5 accepted
papers, 3 keynote talks, 1 panel discussion, and over 50 attendees) at CIKM 2025,
catering to professionals, researchers, and practitioners who are interested in spatio-temporal
data intelligence and foundation models to address real-world challenges. The workshop
will not only offer a platform for knowledge exchange but also acknowledge outstanding
contributions through a distinguished Best Paper Award. A dedicated panel discussion
will explore recent advances, emerging trends, and open challenges in integrating
spatio-temporal data and emerging machine learning techniques, fostering dialogue
between academia and industry. Note that this will be the eleventh time that our core
members have organized a similar workshop. The previous 10 workshops were hosted in
top-tier data mining and management venues, e.g., SIGKDD, WWW, and IJCAI, each of
which attracted over 60 participants and 25 submissions on average.
Recommender Systems for Sustainable Development through Responsible Nudging
- Mehrdad Rostami
- Alexander Felfernig
- Wolfgang Wörndl
- Mourad Oussalah
- Avishek Anand
- Mahdi Jalili
- Ashmi Banerjee
Recommender Systems (RS) influence everyday decisions, yet most remain optimized for
short-term engagement or commercial gain. RS4SD aims to shift this focus by exploring
how RS can contribute to sustainable development through behavioral change and nudging
strategies. Aligned with the UN Sustainable Development Goals (SDG), RS4SD will highlight
applications that promote responsible consumption, sustainable mobility, healthy eating,
and digital well-being. In particular, we will focus on how AI and RS can be designed
to foster sustainable behaviors through multi-objective optimization and ethically
aligned interventions. These objectives are directly tied to the UN SDG, and we welcome
all contributions showcasing RS in support of these goals. A central theme of the
workshop is the integration of behavioral science and AI to design interventions that
guide users toward more sustainable and healthier choices while preserving individual
autonomy. Topics of interest include multi-objective recommendation, health-aware
RS, eco-friendly product and tourism RS, as well as novel evaluation metrics that
go beyond accuracy to capture societal impact. RS4SD will bring together researchers,
stakeholders and practitioners from RS, AI, sustainability, and behavioral science
to share models, datasets, frameworks, and real-world use cases. The workshop encourages
interdisciplinary collaboration and aims to build a community dedicated to responsible,
behavior-aware RS that benefit both individuals and society.
Frontiers in Graph Machine Learning for the Large Model Era
- Qingyun Sun
- Ziwei Zhang
- Xingcheng Fu
- Yangqiu Song
- Jianxin Li
- Philip S. Yu
The ''Frontiers in Graph Machine Learning for the Large Model Era (GMLLM'25)'' workshop focuses on advancing graph machine learning (GML) techniques in the context
of increasingly large and powerful models. Graphs offer a principled way to represent
structured and relational data, making them essential for capturing complex dependencies
in knowledge, systems, and behaviors. As the scale and influence of foundation models
grow, graph learning stands at a unique vantage point to enhance model robustness,
improve interpretability, and integrate domain-specific relational priors. This workshop
explores how graph learning can support emerging needs in knowledge reasoning, temporal
and multi-hop inference, and AI systems. It also investigates how advances in representation
learning, structure-aware generalization, and efficient graph processing can contribute
to trustworthy and scalable AI systems. By convening experts in graph learning, knowledge
management, and LLMs, the workshop aims to identify core challenges and opportunities
of GML in the large model era.
Trustworthy Knowledge Discovery and Data Mining (TrustKDD)
- Le Wu
- Jindong Wang
- Ling Chen
- Xiangyu Zhao
- Kui Yu
- Yashar Deldjoo
- Defu Lian
The explosion of data and the widespread adoption of AI techniques, especially the
success of foundation models and generative AI, have transformed knowledge discovery
and data mining (KDD), making them integral to real-world decision-making. For both
traditional AI methods and generative AI, issues such as data noise, algorithmic bias,
lack of interpretability, and privacy concerns can significantly impact the quality
and reliability of extracted knowledge, thereby affecting downstream decision-making.
This workshop aims to bring together researchers and practitioners from information
and knowledge management, data mining, and intelligent systems to explore trustworthy
KDD across diverse settings in the generative AI era. We welcome contributions on
robust data preprocessing, explainable learning algorithms, bias detection and mitigation,
secure and privacy-preserving mining, trustworthy knowledge graph construction, resource-efficient
deployment, alignment of foundation models, and applications for social good. Special
emphasis is placed on emerging challenges posed by large-scale, pre-trained models
in dynamic, multi-source, and user-centric environments. By fostering dialogue between
traditional KDD approaches and innovations in the foundation model era, TrustKDD seeks
to advance trustworthy methodologies that align with CIKM's mission of developing
reliable, scalable, and intelligent information and knowledge systems.
The 1st Workshop on LLM Agents for Social Simulation
- Yige Yuan
- Junkai Zhou
- Bingbing Xu
- Liang Pang
- Du Su
- An Zhang
- Teng Xiao
- Fengli Xu
- Zhaochun Ren
- Xu Chen
Social simulation has long played a crucial role in exploring the mechanisms underlying
human behavior and societal structures. Traditional social simulation relies on rule-based
or statistical models, which makes it difficult to capture the complexity and variability
of the real world. With the emergence and rapid development of large language model
(LLM), new frontiers have been opened toward leveraging LLMs as agent to model human
behavior and interactions. This cutting-edge direction has gained significant attention
and demonstrated promising results, not only advancing research across a wide range
of social science disciplines, but also enabling practical applications in role-playing
scenarios. However, this field still faces multiple challenges, such as capturing
real-world social phenomena, eliminating bias or ethical considerations, and ensuring
usability and reliability. This workshop on LLM Agent for Social Simulation (LASS)
aims to bring together researchers and practitioners from diverse backgrounds to foster
interdisciplinary collaboration, address key challenges, explore new technologies,
and chart promising future directions in this rapidly evolving field.