Publications
2024
- ConferenceFundamental Convergence Analysis of Sharpness-Aware MinimizationP. D. Khanh, H-C Luong, B. S. Mordukhovich, and D. B. TranAdvances in Neural Information Processing Systems, 2024
The paper investigates the fundamental convergence properties of Sharpness-Aware Minimization (SAM), a recently proposed gradient-based optimization method (Foret et al., 2021) that significantly improves the generalization of deep neural networks. The convergence properties including the stationarity of accumulation points, the convergence of the sequence of gradients to the origin, the sequence of function values to the optimal value, and the sequence of iterates to the optimal solution are established for the method. The universality of the provided convergence analysis based on inexact gradient descent frameworks (Khanh et al., 2023b) allows its extensions to the normalized versions of SAM such as VaSSO (Li & Giannakis, 2023), RSAM (Liu et al., 2022), and to the unnormalized versions of SAM such as USAM (Andriushchenko & Flammarion, 2022). Numerical experiments are conducted on classification tasks using deep learning models to confirm the practical aspects of our analysis.
@article{klmtneurips, title = {Fundamental Convergence Analysis of Sharpness-Aware Minimization}, author = {Khanh, P. D. and Luong, H-C and Mordukhovich, B. S. and Tran, D. B.}, journal = {Advances in Neural Information Processing Systems}, year = {2024}, url = {https://papers.nips.cc/paper_files/paper/2024/hash/17b08a9de93e2accf13429643e7eafdc-Abstract-Conference.html}, dimensions = {true}, }
- JournalGlobally convergent coderivative-based generalized Newton methods in nonsmooth optimizationP. D. Khanh, B. S. Mordukhovich, V. T Phat, and D. B. TranMathematical Programming, 2024
This paper proposes and justifies two globally convergent Newton-type methods to solve unconstrained and constrained problems of nonsmooth optimization by using tools of variational analysis and generalized differentiation. Both methods are coderivative-based and employ generalized Hessians (coderivatives of subgradient mappings) associated with objective functions, which are either of class C1,1, or are represented in the form of convex composite optimization, where one of the terms may be extended-real-valued. The proposed globally convergent algorithms are of two types. The first one extends the damped Newton method and requires positive-definiteness of the generalized Hessians for its well-posedness and efficient performance, while the other algorithm is of the regularized Newton type being well-defined when the generalized Hessians are merely positive-semidefinite. The obtained convergence rates for both methods are at least linear, but become superlinear under the semismooth∗ property of subgradient mappings. Problems of convex composite optimization are investigated with and without the strong convexity assumption on smooth parts of objective functions by implementing the machinery of forward-backward envelopes. Numerical experiments are conducted for Lasso problems and for box constrained quadratic programs with providing performance comparisons of the new algorithms and some other first-order and second-order methods that are highly recognized in nonsmooth optimization.
@article{kmptMAPR, title = {Globally convergent coderivative-based generalized Newton methods in nonsmooth optimization}, author = {Khanh, P. D. and Mordukhovich, B. S. and Phat, V. T and Tran, D. B.}, journal = {Mathematical Programming}, volume = {205}, pages = {373-429}, year = {2024}, doi = {10.1007/s10107-023-01980-2}, publisher = {Springer}, dimensions = {true}, }
- JournalA new inexact gradient descent method with applications to nonsmooth convex optimizationP. D. Khanh, B. S. Mordukhovich, and D. B. TranOptimization Methods and Software, 2024
The paper proposes and develops a novel inexact gradient method (IGD) for minimizing C1-smooth functions with Lipschitzian gradients, i.e., for problems of C1,1 optimization. We show that the sequence of gradients generated by IGD converges to zero. The convergence of iterates to stationary points is guaranteed under the Kurdyka- Lojasiewicz (KL) property of the objective function with convergence rates depending on the KL exponent. The newly developed IGD is applied to designing two novel gradient-based methods of nonsmooth convex optimization such as the inexact proximal point methods (GIPPM) and the inexact augmented Lagrangian method (GIALM) for convex programs with linear equality constraints. These two methods inherit global convergence properties from IGD and are confirmed by numerical experiments to have practical advantages over some well-known algorithms of nonsmooth convex optimization.
@article{kmtOMS, title = {A new inexact gradient descent method with applications to nonsmooth convex optimization}, author = {Khanh, P. D. and Mordukhovich, B. S. and Tran, D. B.}, journal = {Optimization Methods and Software}, publisher = {Taylor and Francis}, pages = {1-29}, year = {2024}, doi = {10.1080/10556788.2024.2322700}, dimensions = {true}, }
- JournalInexact proximal methods for weakly convex functionsP. D. Khanh, B. S. Mordukhovich, V. T Phat, and D. B. Tranto appear Journal of Global Optimization, 2024
This paper proposes and develops inexact proximal methods for finding stationary points of the sum of a smooth function and a nonsmooth weakly convex one, where an error is present in the calculation of the proximal mapping of the nonsmooth term. A general framework for finding zeros of a continuous mapping is derived from our previous paper on this subject to establish convergence properties of the inexact proximal point method when the smooth term is vanished and of the inexact proximal gradient method when the smooth term satisfies a descent condition. The inexact proximal point method achieves global convergence with constructive convergence rates when the Moreau envelope of the objective function satisfies the Kurdyka-Lojasiewicz (KL) property. Meanwhile, when the smooth term is twice continuously differentiable with a Lipschitz continuous gradient and a differentiable approximation of the objective function satisfies the KL property, the inexact proximal gradient method achieves the global convergence of iterates with constructive convergence rates.
@article{kmptJOGOprox, title = {Inexact proximal methods for weakly convex functions}, author = {Khanh, P. D. and Mordukhovich, B. S. and Phat, V. T and Tran, D. B.}, journal = {to appear Journal of Global Optimization}, dimensions = {true}, year = {2024}, publisher = {Springer} }
- PreprintGlobally Convergent Derivative-Free Methods in Nonconvex Optimization with and without NoiseP. D. Khanh, B. S. Mordukhovich, and D. B. Tran2024
This paper addresses the study of nonconvex derivative-free optimization problems, where only information of either smooth objective functions or their noisy approximations is available. General derivative-free methods are proposed for minimizing differentiable (not necessarily convex) functions with globally Lipschitz continuous gradients, where the accuracy of approximate gradients is interacting with stepsizes and exact gradient values. Analysis in the noiseless case guarantees convergence of the gradient sequence to the origin as well as global convergence with constructive convergence rates of the sequence of iterates under the Kurdyka-Łojasiewicz property. In the noisy case, without any noise level information, the designed algorithms reach near-stationary points with providing estimates on the required number of iterations and function evaluations. Addressing functions with locally Lipschitzian gradients, two algorithms are introduced to handle the noiseless and noisy cases, respectively. The noiseless version is based on the standard backtracking linesearch and achieves fundamental convergence properties similarly to the global Lipschitzian case. The noisy version is based on a novel bidirectional linesearch and is shown to reach near-stationary points after a finite number of iterations when the Polyak-Łojasiewicz inequality is imposed. Numerical experiments are conducted on a diverse set of test problems to demonstrate more robustness of the newly proposed algorithms in comparison with other finite-difference-based schemes and some highly efficient, production-ready codes from the SciPy library.
@article{kmtMAPR, title = {Globally Convergent Derivative-Free Methods in Nonconvex Optimization with and without Noise}, author = {Khanh, P. D. and Mordukhovich, B. S. and Tran, D. B.}, url = {https://optimization-online.org/?p=26889}, year = {2024}, publisher = {Springer}, }
- PreprintLocal Convergence Analysis for Nonisolated Solutions to Derivative-Free Methods of OptimizationD. H. Cuong, P. D. Khanh, B. S. Mordukhovich, and D. B. Tran2024
Local Convergence Analysis for Nonisolated Solutions to Derivative-Free Methods of Optimization
@article{ckmtOptim, title = {Local Convergence Analysis for Nonisolated Solutions to Derivative-Free Methods of Optimization}, author = {Cuong, D. H. and Khanh, P. D. and Mordukhovich, B. S. and Tran, D. B.}, url = {https://optimization-online.org/?p=27216}, year = {2024}, publisher = {Springer} }
- ConferenceConvergence of Sharpness-Aware Minimization with MomentumP. D. Khanh, H.-C. Luong, B. S. Mordukhovich, D. B. Tran, and 1 more author2024
This paper presents an analysis of Sharpness-Aware Minimization (SAM), a recently introduced efficient optimizer that has demonstrated remarkable improvements in the generalization of deep neural networks. A comprehensive analysis of the asymptotic convergence behaviors of the optimizer when integrated with momentum is proposed. We first show the convergence of the gradient sequence to zero and the topological properties of the set of accumulation points generated by the iterative sequence. Under the assumption of the isolation of stationary points, especially when the function is strongly convex, the convergence of the sequence of iterates is ensured. To validate the practical implications of our analysis, we conduct numerical experiments on classification tasks employing well-known deep learning models, including ResNet18 and ResNet34, with standard datasets CIFAR-10, CIFAR-100, MNIST, and Fashion-MNIST. The numerical results show that, in general, incorporating momentum improves both the training process and testing accuracy for SAM rather than just using standard SGD.
@article{klmttOptim, title = {Convergence of Sharpness-Aware Minimization with Momentum}, author = {Khanh, P. D. and Luong, H.-C. and Mordukhovich, B. S. and Tran, D. B. and Vo, T.}, url = {https://link.springer.com/chapter/10.1007/978-3-031-73420-5_11}, year = {2024}, publisher = {Springer} }
2023
- JournalGeneralized damped Newton algorithms in nonsmooth optimization via second-order subdifferentialsP. D. Khanh, B. S. Mordukhovich, V. T Phat, and D. B. TranJournal of Global Optimization, 2023
The paper proposes and develops new globally convergent algorithms of the generalized damped Newton type for solving important classes of nonsmooth optimization problems. These algorithms are based on the theory and calculations of second-order subdifferentials of nonsmooth functions with employing the machinery of second-order variational analysis and generalized differentiation. First we develop a globally superlinearly convergent damped Newton-type algorithm for the class of continuously differentiable functions with Lipschitzian gradients, which are nonsmooth of second order. Then we design such a globally convergent algorithm to solve a structured class of nonsmooth quadratic composite problems with extended-real-valued cost functions, which typically arise in machine learning and statistics. Finally, we present the results of numerical experiments and compare the performance of our main algorithm applied to an important class of Lasso problems with those achieved by other first-order and second-order optimization algorithms.
@article{kmptJOGO, title = {Generalized damped Newton algorithms in nonsmooth optimization via second-order subdifferentials}, author = {Khanh, P. D. and Mordukhovich, B. S. and Phat, V. T and Tran, D. B.}, journal = {Journal of Global Optimization}, volume = {86}, pages = {93-122}, year = {2023}, doi = {10.1007/s10898-022-01248-7}, publisher = {Springer}, dimensions = {true}, }
- JournalInexact reduced gradient methods in nonconvex optimizationP. D. Khanh, B. S. Mordukhovich, and D. B. TranJournal of Optimization Theory and Applications, 2023
This paper proposes and develops new linesearch methods with inexact gradient information for finding stationary points of nonconvex continuously differentiable functions on finite-dimensional spaces. Some abstract convergence results for a broad class of linesearch methods are stablished. A general scheme for inexact reduced gradient (IRG) methods is proposed, where the errors in the gradient approximation automatically adapt with the magnitudes of the exact gradients. The sequences of iterations are shown to obtain stationary accumulation points when different stepsize selections are employed. Convergence results with constructive convergence rates for the developed IRG methods are established under the Kurdyka- Lojasiewicz property. The obtained results for the IRG methods are confirmed by encouraging numerical experiments, which demonstrate advantages of automatically controlled errors in IRG methods over other frequently used error selections.
@article{kmtJOTA, title = {Inexact reduced gradient methods in nonconvex optimization}, author = {Khanh, P. D. and Mordukhovich, B. S. and Tran, D. B.}, journal = {Journal of Optimization Theory and Applications}, pages = {1-41}, year = {2023}, doi = {10.1007/s10957-023-02319-9}, publisher = {Springer}, dimensions = {true}, }