Tingting Wu

PhD Student

image

I’m a Ph.D. student in Research Center for Social Computing and Information Retrieval(SCIR), at Harbin Institute of Technology (HIT, China). I am co-advised by Prof. Ting Liu and Prof. Xiao Ding. My research interests include machine learning, label-noise learning and NLP applications.


Publications

STGN: an Implicit Regularization Method for Learning with Noisy Labels in Natural Language Processing. EMNLP 2022



Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, , Bing Qin, Ting Liu

We propose a novel stochastic tailor-made gradient noise (STGN), mitigating the effect of inherent label noise by introducing tailor-made benign noise for each sample. Specifically, we investigate multiple principles to precisely and stably discriminate correct samples from incorrect ones and thus apply different intensities of perturbation to them. A detailed theoretical analysis shows that STGN has good properties, beneficial for model generalization. Experiments on three different NLP tasks demonstrate the effectiveness and versatility of STGN. Also, STGN can boost existing robust training methods.

DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination. IEEE Transactions on Multimedia 2023



Tingting Wu, Xiao Ding, Hao Zhang, Jinglong Gao, Minji Tang, Li Du, Bing Qin, Ting Liu

We propose to append a novel loss function DiscrimLoss on top of the existing task loss. Its main effect is to automatically and stably estimate the importance of easy samples and difficult samples (including hard and incorrect samples) at the early stages of training to improve the model performance. Then, during the following stages, DiscrimLoss is dedicated to discriminating between hard and incorrect samples to improve the model generalization. Such a training strategy can be formulated dynamically in a self-supervised manner, effectively mimicking the main principle of curriculum learning. Experiments on image classification, image regression, text sequence regression, and event relation reasoning demonstrate the versatility and effectiveness of our method, particularly in the presence of diversified noise levels.

NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing. Findings of ACL 2023



Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu

We contribute NoisywikiHow, the largest NLP benchmark built with minimal supervision. Specifically, inspired by human cognition, we explicitly construct multiple sources of label noise to imitate human errors throughout the annotation, replicating realworld noise, whose corruption is affected by both ground-truth labels and instances. Moreover, we provide a variety of noise levels to support controlled experiments on noisy data, enabling us to evaluate LNL methods systematically and comprehensively. After that, we conduct extensive multi-dimensional experiments on a broad range of LNL methods, obtaining new and intriguing findings.

Quadtree-based optimal path routing with the smallest routing table size. IEEE Global Communications Conference 2014



Tingting Wu, Chi Zhang, Nenghai Yu, Miao Pan

We present a novel geographical routing mechanism which can guarantee the optimal path with the minimum overhead. By utilizing the geographical location information and the quadtree data structure, the routing table size can be reduced to its information-theoretic lower bound. Our theoretical analysis suggests that the performance of the routing table size of our proposed scheme is better than the best IP-based routing table compression result in the literature.

CC-FedAvg: Computationally Customized Federated Averaging. IEEE Internet of Things Journal 2023



Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

We propose a strategy for estimating local models without computationally intensive iterations. Based on it, we propose Computationally Customized Federated Averaging (CC-FedAvg), which allows participants to determine whether to perform traditional local training or model estimation in each round based on their current computational budgets. Both theoretical analysis and exhaustive experiments indicate that CC-FedAvg has the same convergence rate and comparable performance as FedAvg without resource constraints. Furthermore, CC-FedAvg can be viewed as a computation-efficient version of FedAvg that retains model performance while considerably lowering computation overhead.

Data Augmentation Based Federated Learning. IEEE Internet of Things Journal 2023



Hao Zhang, Qingying Hou, Tingting Wu, Siyao Cheng, Jie Liu

We enhance FL model by focusing on data rather than model training. We reduce data heterogeneity by enhancing the trained local data to improve FL performance. Specifically, we propose an FL method based on data augmentation (abbr. FedM-UNE), implementing the classic data augmentation method MixUp in federated scenarios without transferring raw data. Furthermore, in order to adapt this method to regression tasks, we first modify MixUp by bilateral neighborhood expansion (MixUp-BNE), and then propose a federated data augmentation method named FedM-BNE based on it. Compared with the conventional FL method, both FedM-UNE and FedM-BNE increase negligible overhead. To demonstrate the effectiveness, we conduct exhaustive experiments on six datasets employing a variety of loss functions. The results indicate that FedM-UNE and FedM-BNE consistently improve the performance of the FL model. Moreover, our methods are compatible with existing FL enhancements, which yield further improvements in performance.

Dynamic layer-wise sparsification for distributed deep learning. FGCS 2023



Hao Zhang, Tingting Wu, Zhifeng Ma, Feng Li, Jie Liu

We introduce a Dynamic Layer-wise Sparsification (DLS) mechanism and its extensions, DLS(s). DLS(s) efficiently adjusts the sparsity ratios of the layers to make the uploaded threshold of each layer automatically tend to be the unified global one, so as to retain the good performance of Global Top-k SGD and the high efficiency of Layer-wise Top-k SGD. The experimental results show that DLS(s) outperforms Layer-wise Top-k SGD in performance, and performs close to Global Top-k SGD yet have much less training time.

Aperiodic Local SGD: Beyond Local SGD. ICPP 2022



Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

We investigate local SGD with an arbitrary synchronization scheme to answer two questions: (1) Is the periodic synchronization scheme best? (2) If not, what is the optimal one? First, for any synchronization scheme, we derive the performance boundary with fixed overhead, and formulate the performance optimization under given computation and communication constraints. Then we find a succinct property of the optimal scheme that the local iteration number decreases as training continues, which indicates the periodic one is suboptimal. Furthermore, with some reasonable approximations, we obtain an explicit form of the optimal scheme and propose Aperiodic Local SGD (ALSGD) as an improved substitute for local SGD without any overhead increment. Our experiments also confirm that with the same computation and communication overhead, ALSGD outperforms local SGD in performance, especially for heterogeneous data.

FedCos: A Scene-Adaptive Enhancement for Federated Learning. IEEE Internet of Things Journal 2022



Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

we propose FedCos, which reduces the directional inconsistency of local models by introducing a cosine-similarity penalty. It promotes local model iterations toward an auxiliary global direction. Moreover, our approach is auto-adapted to various non-identically and independently distributed (IID) settings without an elaborate selection of hyperparameters. Experimental results on both vision and language tasks with a variety of models (including CNN, ResNet, LSTM, etc.) show that FedCos outperforms the well-known baselines and can enhance them under a variety of FL scenes, including varying degrees of data heterogeneity, different number of participants, and cross-silo and cross-device settings. Besides, FedCos improves the communication efficiency by 2–5 times. With the help of FedCos, multiple FL methods require significantly fewer communication rounds than before to obtain a comparable model.