To see the other types of publications on this topic, follow the link: Natural gradient descent.

Journal articles on the topic 'Natural gradient descent'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Natural gradient descent.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Stokes, James, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. "Quantum Natural Gradient." Quantum 4 (May 25, 2020): 269. http://dx.doi.org/10.22331/q-2020-05-25-269.

Full text
Abstract:
A quantum generalization of Natural Gradient Descent is presented as part of a general-purpose optimization framework for variational quantum circuits. The optimization dynamics is interpreted as moving in the steepest descent direction with respect to the Quantum Information Geometry, corresponding to the real part of the Quantum Geometric Tensor (QGT), also known as the Fubini-Study metric tensor. An efficient algorithm is presented for computing a block-diagonal approximation to the Fubini-Study metric tensor for parametrized quantum circuits, which may be of independent interest.
APA, Harvard, Vancouver, ISO, and other styles
2

Rattray, Magnus, David Saad, and Shun-ichi Amari. "Natural Gradient Descent for On-Line Learning." Physical Review Letters 81, no. 24 (December 14, 1998): 5461–64. http://dx.doi.org/10.1103/physrevlett.81.5461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Heskes, Tom. "On “Natural” Learning and Pruning in Multilayered Perceptrons." Neural Computation 12, no. 4 (April 1, 2000): 881–901. http://dx.doi.org/10.1162/089976600300015637.

Full text
Abstract:
Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster “natural” learning algorithms and more robust pruning procedures.
APA, Harvard, Vancouver, ISO, and other styles
4

Rattray, Magnus, and David Saad. "Analysis of natural gradient descent for multilayer neural networks." Physical Review E 59, no. 4 (April 1, 1999): 4523–32. http://dx.doi.org/10.1103/physreve.59.4523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Inoue, Masato, Hyeyoung Park, and Masato Okada. "On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units –Steepest Gradient Descent and Natural Gradient Descent–." Journal of the Physical Society of Japan 72, no. 4 (April 15, 2003): 805–10. http://dx.doi.org/10.1143/jpsj.72.805.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhao, Pu, Pin-yu Chen, Siyue Wang, and Xue Lin. "Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6909–16. http://dx.doi.org/10.1609/aaai.v34i04.6173.

Full text
Abstract:
Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Howard Hua, and Shun-ichi Amari. "Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons." Neural Computation 10, no. 8 (November 1, 1998): 2137–57. http://dx.doi.org/10.1162/089976698300017007.

Full text
Abstract:
The natural gradient descent method is applied to train an n-m-1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n-m-1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the natural gradient without inverting the Fisher information matrix explicitly. When the input dimension n is much larger than the number of hidden neurons m, the time complexity of computing the natural gradient is O(n).
APA, Harvard, Vancouver, ISO, and other styles
8

Park, Hyeyoung, and Kwanyong Lee. "Adaptive Natural Gradient Method for Learning of Stochastic Neural Networks in Mini-Batch Mode." Applied Sciences 9, no. 21 (October 28, 2019): 4568. http://dx.doi.org/10.3390/app9214568.

Full text
Abstract:
Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.
APA, Harvard, Vancouver, ISO, and other styles
9

MUKUNO, Jun-ichi, and Hajime MATSUI. "Natural Gradient Descent of Complex-Valued Neural Networks Invariant under Rotations." IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E102.A, no. 12 (December 1, 2019): 1988–96. http://dx.doi.org/10.1587/transfun.e102.a.1988.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Neumann, K., C. Strub, and J. J. Steil. "Intrinsic plasticity via natural gradient descent with application to drift compensation." Neurocomputing 112 (July 2013): 26–33. http://dx.doi.org/10.1016/j.neucom.2012.12.047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Zhuo, Li’an, Baochang Zhang, Chen Chen, Qixiang Ye, Jianzhuang Liu, and David Doermann. "Calibrated Stochastic Gradient Descent for Convolutional Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9348–55. http://dx.doi.org/10.1609/aaai.v33i01.33019348.

Full text
Abstract:
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as expensive to compute as the true gradient in many scenarios. This paper introduces a calibrated stochastic gradient descent (CSGD) algorithm for deep neural network optimization. A theorem is developed to prove that an unbiased estimator for the network variables can be obtained in a probabilistic way based on the Lipschitz hypothesis. Our work is significantly distinct from existing gradient optimization methods, by providing a theoretical framework for unbiased variable estimation in the deep learning paradigm to optimize the model parameter calculation. In particular, we develop a generic gradient calibration layer which can be easily used to build convolutional neural networks (CNNs). Experimental results demonstrate that CNNs with our CSGD optimization scheme can improve the stateof-the-art performance for natural image classification, digit recognition, ImageNet object classification, and object detection tasks. This work opens new research directions for developing more efficient SGD updates and analyzing the backpropagation algorithm.
APA, Harvard, Vancouver, ISO, and other styles
12

Zhao, Junsheng, Haikun Wei, Chi Zhang, Weiling Li, Weili Guo, and Kanjian Zhang. "Natural Gradient Learning Algorithms for RBF Networks." Neural Computation 27, no. 2 (February 2015): 481–505. http://dx.doi.org/10.1162/neco_a_00689.

Full text
Abstract:
Radial basis function (RBF) networks are one of the most widely used models for function approximation and classification. There are many strange behaviors in the learning process of RBF networks, such as slow learning speed and the existence of the plateaus. The natural gradient learning method can overcome these disadvantages effectively. It can accelerate the dynamics of learning and avoid plateaus. In this letter, we assume that the probability density function (pdf) of the input and the activation function are gaussian. First, we introduce natural gradient learning to the RBF networks and give the explicit forms of the Fisher information matrix and its inverse. Second, since it is difficult to calculate the Fisher information matrix and its inverse when the numbers of the hidden units and the dimensions of the input are large, we introduce the adaptive method to the natural gradient learning algorithms. Finally, we give an explicit form of the adaptive natural gradient learning algorithm and compare it to the conventional gradient descent method. Simulations show that the proposed adaptive natural gradient method, which can avoid the plateaus effectively, has a good performance when RBF networks are used for nonlinear functions approximation.
APA, Harvard, Vancouver, ISO, and other styles
13

Schraudolph, Nicol N. "Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent." Neural Computation 14, no. 7 (July 1, 2002): 1723–38. http://dx.doi.org/10.1162/08997660260028683.

Full text
Abstract:
We propose a generic method for iteratively approximating various second-order gradient steps—-Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient—-in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.
APA, Harvard, Vancouver, ISO, and other styles
14

Al-batah, Mohammad Subhi, Mutasem Sh Alkhasawneh, Lea Tien Tay, Umi Kalthum Ngah, Habibah Hj Lateh, and Nor Ashidi Mat Isa. "Landslide Occurrence Prediction Using Trainable Cascade Forward Network and Multilayer Perceptron." Mathematical Problems in Engineering 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/512158.

Full text
Abstract:
Landslides are one of the dangerous natural phenomena that hinder the development in Penang Island, Malaysia. Therefore, finding the reliable method to predict the occurrence of landslides is still the research of interest. In this paper, two models of artificial neural network, namely, Multilayer Perceptron (MLP) and Cascade Forward Neural Network (CFNN), are introduced to predict the landslide hazard map of Penang Island. These two models were tested and compared using eleven machine learning algorithms, that is, Levenberg Marquardt, Broyden Fletcher Goldfarb, Resilient Back Propagation, Scaled Conjugate Gradient, Conjugate Gradient with Beale, Conjugate Gradient with Fletcher Reeves updates, Conjugate Gradient with Polakribiere updates, One Step Secant, Gradient Descent, Gradient Descent with Momentum and Adaptive Learning Rate, and Gradient Descent with Momentum algorithm. Often, the performance of the landslide prediction depends on the input factors beside the prediction method. In this research work, 14 input factors were used. The prediction accuracies of networks were verified using the Area under the Curve method for the Receiver Operating Characteristics. The results indicated that the best prediction accuracy of 82.89% was achieved using the CFNN network with the Levenberg Marquardt learning algorithm for the training data set and 81.62% for the testing data set.
APA, Harvard, Vancouver, ISO, and other styles
15

Yang, Howard Hua, and Shun-ichi Amari. "Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information." Neural Computation 9, no. 7 (October 1, 1997): 1457–82. http://dx.doi.org/10.1162/neco.1997.9.7.1457.

Full text
Abstract:
There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.
APA, Harvard, Vancouver, ISO, and other styles
16

Nitta, Tohru. "Learning Dynamics of a Single Polar Variable Complex-Valued Neuron." Neural Computation 27, no. 5 (May 2015): 1120–41. http://dx.doi.org/10.1162/neco_a_00729.

Full text
Abstract:
This letter investigates the characteristics of the complex-valued neuron model with parameters represented by polar coordinates (called polar variable complex-valued neuron). The parameters of the polar variable complex-valued neuron are unidentifiable. The plateau phenomenon can occur during learning of the polar variable complex-valued neuron. Furthermore, computer simulations suggest that a single polar variable complex-valued neuron has the following characteristics in the case of using the steepest gradient-descent method with square error: (1) unidentifiable parameters (singular points) degrade the learning speed and (2) a plateau can occur during learning. When the weight is attracted to the singular point, the learning tends to become stuck. However, computer simulations also show that the steepest gradient-descent method with amplitude-phase error and the complex-valued natural gradient method could reduce the effects of the singular points. The learning dynamics near singular points depends on the error functions and the training algorithms used.
APA, Harvard, Vancouver, ISO, and other styles
17

Da Costa, Lancelot, Thomas Parr, Biswa Sengupta, and Karl Friston. "Neural Dynamics under Active Inference: Plausibility and Efficiency of Information Processing." Entropy 23, no. 4 (April 12, 2021): 454. http://dx.doi.org/10.3390/e23040454.

Full text
Abstract:
Active inference is a normative framework for explaining behaviour under the free energy principle—a theory of self-organisation originating in neuroscience. It specifies neuronal dynamics for state-estimation in terms of a descent on (variational) free energy—a measure of the fit between an internal (generative) model and sensory observations. The free energy gradient is a prediction error—plausibly encoded in the average membrane potentials of neuronal populations. Conversely, the expected probability of a state can be expressed in terms of neuronal firing rates. We show that this is consistent with current models of neuronal dynamics and establish face validity by synthesising plausible electrophysiological responses. We then show that these neuronal dynamics approximate natural gradient descent, a well-known optimisation algorithm from information geometry that follows the steepest descent of the objective in information space. We compare the information length of belief updating in both schemes, a measure of the distance travelled in information space that has a direct interpretation in terms of metabolic cost. We show that neural dynamics under active inference are metabolically efficient and suggest that neural representations in biological agents may evolve by approximating steepest descent in information space towards the point of optimal inference.
APA, Harvard, Vancouver, ISO, and other styles
18

Boffi, Nicholas M., and Jean-Jacques E. Slotine. "Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction." Neural Computation 33, no. 3 (March 2021): 590–673. http://dx.doi.org/10.1162/neco_a_01360.

Full text
Abstract:
Stable concurrent learning and control of dynamical systems is the subject of adaptive control. Despite being an established field with many practical applications and a rich theory, much of the development in adaptive control for nonlinear systems revolves around a few key algorithms. By exploiting strong connections between classical adaptive nonlinear control techniques and recent progress in optimization and machine learning, we show that there exists considerable untapped potential in algorithm development for both adaptive nonlinear control and adaptive dynamics prediction. We begin by introducing first-order adaptation laws inspired by natural gradient descent and mirror descent. We prove that when there are multiple dynamics consistent with the data, these non-Euclidean adaptation laws implicitly regularize the learned model. Local geometry imposed during learning thus may be used to select parameter vectors—out of the many that will achieve perfect tracking or prediction—for desired properties such as sparsity. We apply this result to regularized dynamics predictor and observer design, and as concrete examples, we consider Hamiltonian systems, Lagrangian systems, and recurrent neural networks. We subsequently develop a variational formalism based on the Bregman Lagrangian. We show that its Euler Lagrange equations lead to natural gradient and mirror descent-like adaptation laws with momentum, and we recover their first-order analogues in the infinite friction limit. We illustrate our analyses with simulations demonstrating our theoretical results.
APA, Harvard, Vancouver, ISO, and other styles
19

Clémençon, Stephan, Patrice Bertail, Emilie Chautru, and Guillaume Papa. "Optimal survey schemes for stochastic gradient descent with applications to M-estimation." ESAIM: Probability and Statistics 23 (2019): 310–37. http://dx.doi.org/10.1051/ps/2018021.

Full text
Abstract:
Iterative stochastic approximation methods are widely used to solve M-estimation problems, in the context of predictive learning in particular. In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural and popular approach to gradient descent in this context consists in substituting the “full data” statistics with their counterparts based on subsamples picked at random of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling with unequal inclusion probabilities on stochastic gradient descent-based M-estimation methods. Precisely, we prove that, in presence of some a priori information, one may significantly increase statistical accuracy in terms of limit variance, when choosing appropriate first order inclusion probabilities. These results are described by asymptotic theorems and are also supported by illustrative numerical experiments.
APA, Harvard, Vancouver, ISO, and other styles
20

Duan, Xiaomin. "A Natural Gradient Descent Algorithm for the Solution of Lyapunov Equations Based on the Geodesic Distance." Journal of Computational Mathematics 32, no. 1 (June 2014): 93–106. http://dx.doi.org/10.4208/jcm.1310-m4225.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Wu, Jiann-Ming. "Natural Discriminant Analysis Using Interactive Potts Models." Neural Computation 14, no. 3 (March 1, 2002): 689–713. http://dx.doi.org/10.1162/089976602317250951.

Full text
Abstract:
Natural discriminant analysis based on interactive Potts models is developed in this work. A generative model composed of piece-wise multivariate gaussian distributions is used to characterize the input space, exploring the embedded clustering and mixing structures and developing proper internal representations of input parameters. The maximization of a log-likelihood function measuring the fitness of all input parameters to the generative model, and the minimization of a design cost summing up square errors between posterior outputs and desired outputs constitutes a mathematical framework for discriminant analysis. We apply a hybrid of the mean-field annealing and the gradient-descent methods to the optimization of this framework and obtain multiple sets of interactive dynamics, which realize coupled Potts models for discriminant analysis. The new learning process is a whole process of component analysis, clustering analysis, and labeling analysis. Its major improvement compared to the radial basis function and the support vector machine is described by using some artificial examples and a real-world application to breast cancer diagnosis.
APA, Harvard, Vancouver, ISO, and other styles
22

Duan, Xiaomin, Huafei Sun, Linyu Peng, and Xinyu Zhao. "A natural gradient descent algorithm for the solution of discrete algebraic Lyapunov equations based on the geodesic distance." Applied Mathematics and Computation 219, no. 19 (June 2013): 9899–905. http://dx.doi.org/10.1016/j.amc.2013.03.119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Wibisono, Andre, Ashia C. Wilson, and Michael I. Jordan. "A variational perspective on accelerated methods in optimization." Proceedings of the National Academy of Sciences 113, no. 47 (November 9, 2016): E7351—E7358. http://dx.doi.org/10.1073/pnas.1614734113.

Full text
Abstract:
Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. Although many generalizations and extensions of Nesterov’s original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the Bregman Lagrangian, which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods corresponds to traveling the same curve in spacetime at different speeds. From this perspective, Nesterov’s technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.
APA, Harvard, Vancouver, ISO, and other styles
24

Movellan, Javier R. "A Learning Theorem for Networks at Detailed Stochastic Equilibrium." Neural Computation 10, no. 5 (July 1, 1998): 1157–78. http://dx.doi.org/10.1162/089976698300017395.

Full text
Abstract:
This article analyzes learning in continuous stochastic neural networks defined by stochastic differential equations (SDE). In particular, it studies gradient descent learning rules to train the equilibrium solutions of these networks. A theorem is given that specifies sufficient conditions for the gradient descent learning rules to be local covariance statistics between two random variables: (1) an evaluator that is the same for all the network parameters and (2) a system variable that is independent of the learning objective. While this article focuses on continuous stochastic neural networks, the theorem applies to any other system with Boltzmann-like equilibrium distributions. The generality of the theorem suggests that instead of suppressing noise present in physical devices, a natural alternative is to use it to simplify the credit assignment problem. In deterministic networks, credit assignment requires an evaluation signal that is different for each node in the network. Surprisingly, when noise is not suppressed, all that is needed is an evaluator that is the same for the entire network and a local Hebbian signal. This modularization of signals greatly simplifies hardware and software implementations. The article shows how the theorem applies to four different learning objectives that span supervised, reinforcement, and unsupervised problems: (1) regression, (2) density estimation, (3) risk minimization, and (4) information maximization. Simulations, implementation issues, and implications for computational neuroscience are discussed.
APA, Harvard, Vancouver, ISO, and other styles
25

GRUNDSTROM, ERIC L., and JAMES A. REGGIA. "LEARNING ACTIVATION RULES RATHER THAN CONNECTION WEIGHTS." International Journal of Neural Systems 07, no. 02 (May 1996): 129–47. http://dx.doi.org/10.1142/s0129065796000117.

Full text
Abstract:
In the construction of neural networks involving associative recall, information is sometimes best encoded with a local representation. Moreover, a priori knowledge can lead to a natural selection of connection weights for these networks. With predetermined and fixed weights, standard learning algorithms that work by altering connection strengths are unable to train such networks. To address this problem, this paper derives a supervised learning rule based on gradient descent, where connection weights are fixed and a network is trained by changing the activation rule. It incorporates both traditional and competitive activation mechanisms, the latter being an efficient method for instilling competition in a network. The learning rule has been implemented, and the results from several test networks demonstrate that it works effectively.
APA, Harvard, Vancouver, ISO, and other styles
26

Gelenbe, Erol, and Stelios Timotheou. "Random Neural Networks with Synchronized Interactions." Neural Computation 20, no. 9 (September 2008): 2308–24. http://dx.doi.org/10.1162/neco.2008.04-07-509.

Full text
Abstract:
Large-scale distributed systems, such as natural neuronal and artificial systems, have many local interconnections, but they often also have the ability to propagate information very fast over relatively large distances. Mechanisms that enable such behavior include very long physical signaling paths and possibly saccades of synchronous behavior that may propagate across a network. This letter studies the modeling of such behaviors in neuronal networks and develops a related learning algorithm. This is done in the context of the random neural network (RNN), a probabilistic model with a well-developed mathematical theory, which was inspired by the apparently stochastic spiking behavior of certain natural neuronal systems. Thus, we develop an extension of the RNN to the case when synchronous interactions can occur, leading to synchronous firing by large ensembles of cells. We also present an O(N3) gradient descent learning algorithm for an N-cell recurrent network having both conventional excitatory-inhibitory interactions and synchronous interactions. Finally, the model and its learning algorithm are applied to a resource allocation problem that is NP-hard and requires fast approximate decisions.
APA, Harvard, Vancouver, ISO, and other styles
27

Bautembach, Dennis, Iason Oikonomidis, and Antonis Argyros. "Filling the Joints: Completion and Recovery of Incomplete 3D Human Poses." Technologies 6, no. 4 (October 30, 2018): 97. http://dx.doi.org/10.3390/technologies6040097.

Full text
Abstract:
We present a comparative study of three matrix completion and recovery techniques based on matrix inversion, gradient descent, and Lagrange multipliers, applied to the problem of human pose estimation. 3D human pose estimation algorithms may exhibit noise or may completely fail to provide estimates for some joints. A post-process is often employed to recover the missing joints’ locations from the remaining ones, typically by enforcing kinematic constraints or by using a prior learned from a database of natural poses. Matrix completion and recovery techniques fall into the latter category and operate by filling-in missing entries of a matrix whose available/non-missing entries may be additionally corrupted by noise. We compare the performance of three such techniques in terms of the estimation error of their output as well as their runtime, in a series of simulated and real-world experiments. We conclude by recommending use cases for each of the compared techniques.
APA, Harvard, Vancouver, ISO, and other styles
28

SUN, YIJUN, SINISA TODOROVIC, and JIAN LI. "REDUCING THE OVERFITTING OF ADABOOST BY CONTROLLING ITS DATA DISTRIBUTION SKEWNESS." International Journal of Pattern Recognition and Artificial Intelligence 20, no. 07 (November 2006): 1093–116. http://dx.doi.org/10.1142/s0218001406005137.

Full text
Abstract:
AdaBoost rarely suffers from overfitting problems in low noise data cases. However, recent studies with highly noisy patterns have clearly shown that overfitting can occur. A natural strategy to alleviate the problem is to penalize the data distribution skewness in the learning process to prevent several hardest examples from spoiling decision boundaries. In this paper, we pursue such a penalty scheme in the mathematical programming setting, which allows us to define a suitable classifier soft margin. By using two smooth convex penalty functions, based on Kullback–Leibler divergence (KL) and l2 norm, we derive two new regularized AdaBoost algorithms, referred to as AdaBoostKL and AdaBoostNorm2, respectively. We prove that our algorithms perform stage-wise gradient descent on a cost function, defined in the domain of their associated soft margins. We demonstrate the effectiveness of the proposed algorithms through experiments over a wide variety of data sets. Compared with other regularized AdaBoost algorithms, our methods achieve at least the same or better performance.
APA, Harvard, Vancouver, ISO, and other styles
29

Grüning, André. "Elman Backpropagation as Reinforcement for Simple Recurrent Networks." Neural Computation 19, no. 11 (November 2007): 3108–31. http://dx.doi.org/10.1162/neco.2007.19.11.3108.

Full text
Abstract:
Simple recurrent networks (SRNs) in symbolic time-series prediction (e.g., language processing models) are frequently trained with gradient descent--based learning algorithms, notably with variants of backpropagation (BP). A major drawback for the cognitive plausibility of BP is that it is a supervised scheme in which a teacher has to provide a fully specified target answer. Yet agents in natural environments often receive summary feedback about the degree of success or failure only, a view adopted in reinforcement learning schemes. In this work, we show that for SRNs in prediction tasks for which there is a probability interpretation of the network's output vector, Elman BP can be reimplemented as a reinforcement learning scheme for which the expected weight updates agree with the ones from traditional Elman BP. Network simulations on formal languages corroborate this result and show that the learning behaviors of Elman backpropagation and its reinforcement variant are very similar also in online learning tasks.
APA, Harvard, Vancouver, ISO, and other styles
30

Fominyh, Аlexander V. "Method for finding a solution to a linear nonstationary interval ОDЕ system." Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes 17, no. 2 (2021): 148–65. http://dx.doi.org/10.21638/11701/spbu10.2021.205.

Full text
Abstract:
The article analyses a linear nonstationary interval system of ordinary differential equations so that the elements of the matrix of the system are the intervals with the known lower and upper bounds. The system is defined on the known finite time interval. It is required to construct a trajectory, which brings this system from the given initial position to the given final state. The original problem is reduced to finding a solution of the differential inclusion of a special form with the fixed right endpoint. With the help of support functions, this problem is reduced to minimizing a functional in the space of piecewise continuous functions. Under a natural additional assumption, this functional is Gateaux differentiable. For the functional, Gateaux gradient is found, necessary and sufficient conditions for the minimum are obtained. Оn the basis of these conditions, the method of the steepest descent is applied to the original problem. Some examples illustrate the constructed algorithm realization.
APA, Harvard, Vancouver, ISO, and other styles
31

Gamal, Donia, Marco Alfonse, El-Sayed M. El-Horbaty, and Abdel-Badeeh M. Salem. "Analysis of Machine Learning Algorithms for Opinion Mining in Different Domains." Machine Learning and Knowledge Extraction 1, no. 1 (December 8, 2018): 224–34. http://dx.doi.org/10.3390/make1010014.

Full text
Abstract:
Sentiment classification (SC) is a reference to the task of sentiment analysis (SA), which is a subfield of natural language processing (NLP) and is used to decide whether textual content implies a positive or negative review. This research focuses on the various machine learning (ML) algorithms which are utilized in the analyzation of sentiments and in the mining of reviews in different datasets. Overall, an SC task consists of two phases. The first phase deals with feature extraction (FE). Three different FE algorithms are applied in this research. The second phase covers the classification of the reviews by using various ML algorithms. These are Naïve Bayes (NB), Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Passive Aggressive (PA), Maximum Entropy (ME), Adaptive Boosting (AdaBoost), Multinomial NB (MNB), Bernoulli NB (BNB), Ridge Regression (RR) and Logistic Regression (LR). The performance of PA with a unigram is the best among other algorithms for all used datasets (IMDB, Cornell Movies, Amazon and Twitter) and provides values that range from 87% to 99.96% for all evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
32

Pal, Dipan K., and Marios Savvides. "Non-Parametric Transformation Networks for Learning General Invariances from Data." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4667–74. http://dx.doi.org/10.1609/aaai.v33i01.33014667.

Full text
Abstract:
ConvNets, through their architecture, only enforce invariance to translation. In this paper, we introduce a new class of deep convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a natural generalization of ConvNets and can be optimized directly using gradient descent. Unlike almost all previous works in deep architectures, they make no assumption regarding the structure of the invariances present in the data and in that aspect are flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks (TN), which yields a better understanding of the connection between the two. We demonstrate the efficacy of NPTNs on data such as MNIST with extreme transformations and CIFAR10 where they outperform baselines, and further outperform several recent algorithms on ETH-80. They do so while having the same number of parameters. We also show that they are more effective than ConvNets in modelling symmetries and invariances from data, without the explicit knowledge of the added arbitrary nuisance transformations. Finally, we replace ConvNets with NPTNs within Capsule Networks and show that this enables Capsule Nets to perform even better.
APA, Harvard, Vancouver, ISO, and other styles
33

Ibrahim, Syahira, Norhaliza Abdul Wahab, Fatimah Sham Ismail, and Yahaya Md Sam. "Optimization of artificial neural network topology for membrane bioreactor filtration using response surface methodology." IAES International Journal of Artificial Intelligence (IJ-AI) 9, no. 1 (March 1, 2020): 117. http://dx.doi.org/10.11591/ijai.v9.i1.pp117-125.

Full text
Abstract:
<p>The optimization of artificial neural networks (ANN) topology for predicting permeate flux of palm oil mill effluent (POME) in membrane bioreactor (MBR) filtration has been investigated using response surface methodology (RSM). A radial basis function neural network (RBFNN) model, trained by gradient descent with momentum (GDM) algorithms was developed to correlate output (permeate flux) to the four exogenous input variables (airflow rate, transmembrane pressure, permeate pump and aeration pump). A second-order polynomial model was developed from training results for natural log mean square error of 50 developed ANNs to generate 3D response surfaces. The optimum ANN topology had minimum ln MSE when the number of hidden neurons, spread, momentum coefficient, learning rate and number of epochs were 16, 1.4, 0.28, 0.3 and 1852, respectively. The MSE and regression coeffcient of the ANN model were determined as 0.0022 and 0.9906 for training, 0.0052 and 0.9839 for testing and 0.0217 and 0.9707 for validation data sets. These results confirmed that combining RSM and ANN was precise for predicting permeates flux of POME on MBR system. This development may have significant potential to improve model accuracy and reduce computational time.</p>
APA, Harvard, Vancouver, ISO, and other styles
34

Guttenberg, Nicholas, Nathaniel Virgo, and Alexandra Penn. "On the Potential for Open-Endedness in Neural Networks." Artificial Life 25, no. 2 (May 2019): 145–67. http://dx.doi.org/10.1162/artl_a_00286.

Full text
Abstract:
Natural evolution gives the impression of leading to an open-ended process of increasing diversity and complexity. If our goal is to produce such open-endedness artificially, this suggests an approach driven by evolutionary metaphor. On the other hand, techniques from machine learning and artificial intelligence are often considered too narrow to provide the sort of exploratory dynamics associated with evolution. In this article, we hope to bridge that gap by reviewing common barriers to open-endedness in the evolution-inspired approach and how they are dealt with in the evolutionary case—collapse of diversity, saturation of complexity, and failure to form new kinds of individuality. We then show how these problems map onto similar ones in the machine learning approach, and discuss how the same insights and solutions that alleviated those barriers in evolutionary approaches can be ported over. At the same time, the form these issues take in the machine learning formulation suggests new ways to analyze and resolve barriers to open-endedness. Ultimately, we hope to inspire researchers to be able to interchangeably use evolutionary and gradient-descent-based machine learning methods to approach the design and creation of open-ended systems.
APA, Harvard, Vancouver, ISO, and other styles
35

Su, Fang, Hai-Yang Shang, and Jing-Yan Wang. "Low-Rank Deep Convolutional Neural Network for Multitask Learning." Computational Intelligence and Neuroscience 2019 (May 20, 2019): 1–10. http://dx.doi.org/10.1155/2019/7410701.

Full text
Abstract:
In this paper, we propose a novel multitask learning method based on the deep convolutional network. The proposed deep network has four convolutional layers, three max-pooling layers, and two parallel fully connected layers. To adjust the deep network to multitask learning problem, we propose to learn a low-rank deep network so that the relation among different tasks can be explored. We proposed to minimize the number of independent parameter rows of one fully connected layer to explore the relations among different tasks, which is measured by the nuclear norm of the parameter of one fully connected layer, and seek a low-rank parameter matrix. Meanwhile, we also propose to regularize another fully connected layer by sparsity penalty so that the useful features learned by the lower layers can be selected. The learning problem is solved by an iterative algorithm based on gradient descent and back-propagation algorithms. The proposed algorithm is evaluated over benchmark datasets of multiple face attribute prediction, multitask natural language processing, and joint economics index predictions. The evaluation results show the advantage of the low-rank deep CNN model over multitask problems.
APA, Harvard, Vancouver, ISO, and other styles
36

Hartman, Eric, and James D. Keeler. "Predicting the Future: Advantages of Semilocal Units." Neural Computation 3, no. 4 (December 1991): 566–78. http://dx.doi.org/10.1162/neco.1991.3.4.566.

Full text
Abstract:
In investigating gaussian radial basis function (RBF) networks for their ability to model nonlinear time series, we have found that while RBF networks are much faster than standard sigmoid unit backpropagation for low-dimensional problems, their advantages diminish in high-dimensional input spaces. This is particularly troublesome if the input space contains irrelevant variables. We suggest that this limitation is due to the localized nature of RBFs. To gain the advantages of the highly nonlocal sigmoids and the speed advantages of RBFs, we propose a particular class of semilocal activation functions that is a natural interpolation between these two families. We present evidence that networks using these gaussian bar units avoid the slow learning problem of sigmoid unit networks, and, very importantly, are more accurate than RBF networks in the presence of irrelevant inputs. On the Mackey-Glass and Coupled Lattice Map problems, the speedup over sigmoid networks is so dramatic that the difference in training time between RBF and gaussian bar networks is minor. Gaussian bar architectures that superpose composed gaussians (gaussians-of-gaussians) to approximate the unknown function have the best performance. We postulate that an interesing behavior displayed by gaussian bar functions under gradient descent dynamics, which we call automatic connection pruning, is an important factor in the success of this representation.
APA, Harvard, Vancouver, ISO, and other styles
37

Duong, Tuan A., Margaret A. Ryan, and Vu A. Duong. "Space Invariant Independent Component Analysis and ENose for Detection of Selective Chemicals in an Unknown Environment." Journal of Advanced Computational Intelligence and Intelligent Informatics 11, no. 10 (December 20, 2007): 1197–203. http://dx.doi.org/10.20965/jaciii.2007.p1197.

Full text
Abstract:
In this paper, we present a space invariant architecture to enable the Independent Component Analysis (ICA) to solve chemical detection from two unknown mixing chemical sources. The two sets of unknown paired mixture sources are collected via JPL 16-ENose sensor array in the unknown environment with, at most, 12 samples data collected. Our space invariant architecture along with the maximum entropy information technique by Bell and Sejnowski and natural gradient descent by Amari has demonstrated that it is effective to separate the two mixing unknown chemical sources with unknown mixing levels to the array of two original sources under insufficient sampled data. From separated sources, they can be identified by projecting them on the 11 known chemical sources to find the best match for detection. We also present the results of our simulations. These simulations have shown that 100% correct detection could be achieved under the two cases: a) under-completed case where the number of input (mixtures) is larger than number of original chemical sources; and b) regular case where the number of input is as the same as the number of sources while the time invariant architecture approach may face the obstacles: overcomplete case, insufficient data and cumbersome architecture.
APA, Harvard, Vancouver, ISO, and other styles
38

Maillard, Jean, Stephen Clark, and Dani Yogatama. "Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs." Natural Language Engineering 25, no. 4 (July 2019): 433–49. http://dx.doi.org/10.1017/s1351324919000184.

Full text
Abstract:
AbstractWe present two studies on neural network architectures that learn to represent sentences by composing their words according to automatically induced binary trees, without ever being shown a correct parse tree. We use Tree-Long Short-Term Memories (LSTMs) as our composition function, applied along a tree structure found by a differentiable natural language chart parser. The models simultaneously optimise both the composition function and the parser, thus eliminating the need for externally provided parse trees, which are normally required for Tree-LSTMs. They can therefore be seen as tree-based recurrent neural networks that are unsupervised with respect to the parse trees. Due to being fully differentiable, the models are easily trained with an off-the-shelf gradient descent method and backpropagation.In the first part of this paper, we introduce a model based on the CKY chart parser, and evaluate its downstream performance on a natural language inference task and a reverse dictionary task. Further, we show how its performance can be improved with an attention mechanism which fully exploits the parse chart, by attending over all possible subspans of the sentence. We find that our approach is competitive against similar models of comparable size and outperforms Tree-LSTMs that use trees produced by a parser.Finally, we present an alternative architecture based on a shift-reduce parser. We perform an analysis of the trees induced by both our models, to investigate whether they are consistent with each other and across re-runs, and whether they resemble the trees produced by a standard parser.
APA, Harvard, Vancouver, ISO, and other styles
39

Diao, Huabin, Yuexing Hao, Shaoyun Xu, and Gongyan Li. "Implementation of Lightweight Convolutional Neural Networks via Layer-Wise Differentiable Compression." Sensors 21, no. 10 (May 16, 2021): 3464. http://dx.doi.org/10.3390/s21103464.

Full text
Abstract:
Convolutional neural networks (CNNs) have achieved significant breakthroughs in various domains, such as natural language processing (NLP), and computer vision. However, performance improvement is often accompanied by large model size and computation costs, which make it not suitable for resource-constrained devices. Consequently, there is an urgent need to compress CNNs, so as to reduce model size and computation costs. This paper proposes a layer-wise differentiable compression (LWDC) algorithm for compressing CNNs structurally. A differentiable selection operator OS is embedded in the model to compress and train the model simultaneously by gradient descent in one go. Instead of pruning parameters from redundant operators by contrast to most of the existing methods, our method replaces the original bulky operators with more lightweight ones directly, which only needs to specify the set of lightweight operators and the regularization factor in advance, rather than the compression rate for each layer. The compressed model produced by our method is generic and does not need any special hardware/software support. Experimental results on CIFAR-10, CIFAR-100 and ImageNet have demonstrated the effectiveness of our method. LWDC obtains more significant compression than state-of-the-art methods in most cases, while having lower performance degradation. The impact of lightweight operators and regularization factor on the compression rate and accuracy also is evaluated.
APA, Harvard, Vancouver, ISO, and other styles
40

Petrov, Petr V., and Gregory A. Newman. "Estimation of seismic source parameters in 3D elastic media using the reciprocity theorem." GEOPHYSICS 84, no. 6 (November 1, 2019): R963—R976. http://dx.doi.org/10.1190/geo2018-0283.1.

Full text
Abstract:
We have developed a novel method based upon reciprocity principles to simultaneously estimate the location of a seismic event and its source mechanism in 3D heterogeneous media. The method finds double-couple (DC) and non-DC mechanisms of microearthquakes arising from localized induced and natural seismicity. Because the method uses an exhaustive search of the 3D elastic media, it is globally convergent. It does not suffer from local minima realization observed with local optimization methods, including Newton, Gauss-Newton, or gradient-descent algorithms. The computational efficiency of our scheme is derived from the reciprocity principle, in which the number of 3D model realizations corresponds to the number of measurement receivers. The 3D forward modeling is carried out in the damped Fourier domain with a 3D finite-difference frequency-domain fourth- and second-order code developed to simulate elastic waves generated by seismic sources defined by forces and second-order moment density tensors. We evaluate the results of testing this new methodology on synthetic data for the Raft River geothermal field, Idaho, as well as determine its applicability in designing optimal borehole monitoring arrays in a fracking experiment at the Homestake Mine, South Dakota. We also find that the method proposed here can retrieve the moment tensors of the space distributed source with data arising from spatially restricted arrays with limited aperture. The effects of uncertainties on the source parameter estimation are also examined with respect to data noise and model uncertainty.
APA, Harvard, Vancouver, ISO, and other styles
41

Roman, Muhammad, Abdul Shahid, Muhammad Irfan Uddin, Qiaozhi Hua, and Shazia Maqsood. "Exploiting Contextual Word Embedding of Authorship and Title of Articles for Discovering Citation Intent Classification." Complexity 2021 (April 3, 2021): 1–13. http://dx.doi.org/10.1155/2021/5554874.

Full text
Abstract:
The number of scientific publications is growing exponentially. Research articles cite other work for various reasons and, therefore, have been studied extensively to associate documents. It is argued that not all references carry the same level of importance. It is essential to understand the reason for citation, called citation intent or function. Text information can contribute well if new natural language processing techniques are applied to capture the context of text data. In this paper, we have used contextualized word embedding to find the numerical representation of text features. We further investigated the performance of various machine-learning techniques on the numerical representation of text. The performance of each of the classifiers was evaluated on two state-of-the-art datasets containing the text features. In the case of the unbalanced dataset, we observed that the linear Support Vector Machine (SVM) achieved 86% accuracy for the “background” class, where the training was extensive. For the rest of the classes, including “motivation,” “extension,” and “future,” the machine was trained on less than 100 records; therefore, the accuracy was only 57 to 64%. In the case of a balanced dataset, each of the classes has the same accuracy as trained on the same size of training data. Overall, SVM performed best on both of the datasets, followed by the stochastic gradient descent classifier; therefore, SVM can produce good results as text classification on top of contextual word embedding.
APA, Harvard, Vancouver, ISO, and other styles
42

Dong, Yi, Stefan Mihalas, Alexander Russell, Ralph Etienne-Cummings, and Ernst Niebur. "Estimating Parameters of Generalized Integrate-and-Fire Neurons from the Maximum Likelihood of Spike Trains." Neural Computation 23, no. 11 (November 2011): 2833–67. http://dx.doi.org/10.1162/neco_a_00196.

Full text
Abstract:
When a neuronal spike train is observed, what can we deduce from it about the properties of the neuron that generated it? A natural way to answer this question is to make an assumption about the type of neuron, select an appropriate model for this type, and then choose the model parameters as those that are most likely to generate the observed spike train. This is the maximum likelihood method. If the neuron obeys simple integrate-and-fire dynamics, Paninski, Pillow, and Simoncelli ( 2004 ) showed that its negative log-likelihood function is convex and that, at least in principle, its unique global minimum can thus be found by gradient descent techniques. Many biological neurons are, however, known to generate a richer repertoire of spiking behaviors than can be explained in a simple integrate-and-fire model. For instance, such a model retains only an implicit (through spike-induced currents), not an explicit, memory of its input; an example of a physiological situation that cannot be explained is the absence of firing if the input current is increased very slowly. Therefore, we use an expanded model (Mihalas & Niebur, 2009 ), which is capable of generating a large number of complex firing patterns while still being linear. Linearity is important because it maintains the distribution of the random variables and still allows maximum likelihood methods to be used. In this study, we show that although convexity of the negative log-likelihood function is not guaranteed for this model, the minimum of this function yields a good estimate for the model parameters, in particular if the noise level is treated as a free parameter. Furthermore, we show that a nonlinear function minimization method (r-algorithm with space dilation) usually reaches the global minimum.
APA, Harvard, Vancouver, ISO, and other styles
43

Saxena, Anshul, Peter McGranaghan, Muni Rubens, Joseph Salami, Raees Tonse, Amanda Lindeman, Michelle Keller, Paul Lindeman, and Emir Veledar. "Natural language processing (NLP) and machine learning (ML) model for predicting CMS OP-35 categories among patients receiving chemotherapy." Journal of Clinical Oncology 39, no. 15_suppl (May 20, 2021): e13591-e13591. http://dx.doi.org/10.1200/jco.2021.39.15_suppl.e13591.

Full text
Abstract:
e13591 Background: The Hospital Outpatient Quality Reporting Program is a pay-for-quality data reporting program implemented by the Centers for Medicare & Medicaid Services (CMS). Hospitals collect data on various measures of the quality of care provided in outpatient settings for the CMS. One such measure is OP-35, where data about patients who received chemotherapy in outpatient settings are collected. Such quality measures help hospitals assess their performance and allow patients to compare the quality of care among different hospitals in that region. Currently, the process to label data for OP-35 categories is manual. This study aims to develop a model using NLP and ML to predict the ten OP-35 complication categories and automate the process. Methods: Data from 1000 adult cancer patients who received chemotherapy at a comprehensive cancer center in the South Florida region between Sept and Oct 2019 were extracted to train the ML models. Text from the Chief Complaint field was manually labeled into ten binary categories: anemia, nausea, dehydration, neutropenia, diarrhea, emesis, pneumonia, fever, sepsis, and pain. The data were divided into a training set (80%) and a test set (20%). After initial pre-processing of the text, term frequency–inverse document frequency (TF-IDF) feature extraction method with a vocabulary size of 10,000 was applied. Various models (stochastic gradient descent, support vector classification [SVC], and binary relevance, etc.) were trained to predict multiple labels. These models were evaluated using Jaccard score, accuracy, F1 score, and Hamming loss. Additionally, two deep learning approaches: a single dense output layer and multiple dense output layer models, were also used for comparison. Python version 3.8 was utilized for the analysis. Results: The best performing model was SVC, with a Jaccard score of 85.13 and 90% accuracy. In the first deep learning approach, a single dense output layer was used with multiple neurons where each neuron represented only one label. In the second approach, a separate dense layer for each label was created with one neuron. The model with a single output layer produced an accuracy score of 32%, and the model with multiple output layer had an accuracy score of 31%. Both deep learning models with single and multiple output layers did not perform well compared to SVC. Conclusions: Our study shows an early indication regarding the feasibility of modern ML techniques in predicting multiple label categories or outcomes. As a potential clinical decision support system, this model could replace manual data entry, minimize human error, and decrease resources for data collection. In the next stage, healthcare providers will validate this model by manually checking the predicted labels. In the final stage, model will be deployed in real-time to predict OP-35 categories automatically.
APA, Harvard, Vancouver, ISO, and other styles
44

Kim, Youngjun, Paul M. Heider, Isabel RH Lally, and Stéphane M. Meystre. "A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System." JMIR Medical Informatics 9, no. 4 (April 22, 2021): e22797. http://dx.doi.org/10.2196/22797.

Full text
Abstract:
Background Family history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task. Objective This task involves identifying mentions of family members and observations in electronic health record text notes and recognizing the 2 types of relations (family member-living status relations and family member-observation relations). Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of 2 subtasks by exploiting additional labeled data and clinical text-based embedding models. Methods We present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained 2 relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of postchallenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these 2 datasets. We also pretrained language models using clinical notes from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database. Results The voting ensemble achieved better performance than individual classifiers. In the entity identification task, our top-performing system reached a precision of 78.90% and a recall of 83.84%. Our natural language processing system for entity identification took 3rd place out of 17 teams in the challenge. We ranked 4th out of 9 teams in the relation extraction task. Our system substantially benefited from the combination of the 2 datasets. Compared to our official submission with F1 scores of 81.30% and 64.94% for entity identification and relation extraction, respectively, the revised system yielded significantly better performance (P<.05) with F1 scores of 86.02% and 72.48%, respectively. Conclusions We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach to entity identification as a sequence labeling problem produced satisfactory results. Our postchallenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes.
APA, Harvard, Vancouver, ISO, and other styles
45

Garg, Raghu, Himanshu Aggarwal, Piera Centobelli, and Roberto Cerchione. "Extracting Knowledge from Big Data for Sustainability: A Comparison of Machine Learning Techniques." Sustainability 11, no. 23 (November 25, 2019): 6669. http://dx.doi.org/10.3390/su11236669.

Full text
Abstract:
At present, due to the unavailability of natural resources, society should take the maximum advantage of data, information, and knowledge to achieve sustainability goals. In today’s world condition, the existence of humans is not possible without the essential proliferation of plants. In the photosynthesis procedure, plants use solar energy to convert into chemical energy. This process is responsible for all life on earth, and the main controlling factor for proper plant growth is soil since it holds water, air, and all essential nutrients of plant nourishment. Though, due to overexposure, soil gets despoiled, so fertilizer is an essential component to hold the soil quality. In that regard, soil analysis is a suitable method to determine soil quality. Soil analysis examines the soil in laboratories and generates reports of unorganized and insignificant data. In this study, different big data analysis machine learning methods are used to extracting knowledge from data to find out fertilizer recommendation classes on behalf of present soil nutrition composition. For this experiment, soil analysis reports are collected from the Tata soil and water testing center. In this paper, Mahoot library is used for analysis of stochastic gradient descent (SGD), artificial neural network (ANN) performance on Hadoop environment. For better performance evaluation, we also used single machine experiments for random forest (RF), K-nearest neighbors K-NN, regression tree (RT), support vector machine (SVM) using polynomial function, SVM using radial basis function (RBF) methods. Detailed experimental analysis was carried out using overall accuracy, AUC–ROC (receiver operating characteristics (ROC), and area under the ROC curve (AUC)) curve, mean absolute prediction error (MAE), root mean square error (RMSE), and coefficient of determination (R2) validation measurements on soil reports dataset. The results provide a comparison of solution classes and conclude that the SGD outperforms other approaches. Finally, the proposed results support to select the solution or recommend a class which suggests suitable fertilizer to crops for maximum production.
APA, Harvard, Vancouver, ISO, and other styles
46

Elzeki, Omar M., Mahmoud Shams, Shahenda Sarhan, Mohamed Abd Elfattah, and Aboul Ella Hassanien. "COVID-19: a new deep learning computer-aided model for classification." PeerJ Computer Science 7 (February 18, 2021): e358. http://dx.doi.org/10.7717/peerj-cs.358.

Full text
Abstract:
Chest X-ray (CXR) imaging is one of the most feasible diagnosis modalities for early detection of the infection of COVID-19 viruses, which is classified as a pandemic according to the World Health Organization (WHO) report in December 2019. COVID-19 is a rapid natural mutual virus that belongs to the coronavirus family. CXR scans are one of the vital tools to early detect COVID-19 to monitor further and control its virus spread. Classification of COVID-19 aims to detect whether a subject is infected or not. In this article, a model is proposed for analyzing and evaluating grayscale CXR images called Chest X-Ray COVID Network (CXRVN) based on three different COVID-19 X-Ray datasets. The proposed CXRVN model is a lightweight architecture that depends on a single fully connected layer representing the essential features and thus reducing the total memory usage and processing time verse pre-trained models and others. The CXRVN adopts two optimizers: mini-batch gradient descent and Adam optimizer, and the model has almost the same performance. Besides, CXRVN accepts CXR images in grayscale that are a perfect image representation for CXR and consume less memory storage and processing time. Hence, CXRVN can analyze the CXR image with high accuracy in a few milliseconds. The consequences of the learning process focus on decision making using a scoring function called SoftMax that leads to high rate true-positive classification. The CXRVN model is trained using three different datasets and compared to the pre-trained models: GoogleNet, ResNet and AlexNet, using the fine-tuning and transfer learning technologies for the evaluation process. To verify the effectiveness of the CXRVN model, it was evaluated in terms of the well-known performance measures such as precision, sensitivity, F1-score and accuracy. The evaluation results based on sensitivity, precision, recall, accuracy, and F1 score demonstrated that, after GAN augmentation, the accuracy reached 96.7% in experiment 2 (Dataset-2) for two classes and 93.07% in experiment-3 (Dataset-3) for three classes, while the average accuracy of the proposed CXRVN model is 94.5%.
APA, Harvard, Vancouver, ISO, and other styles
47

Delfin, Leandro Morera, Raul Pinto Elias, Humberto de Jesus Ochoa Dominguez, and Osslan Osiris Vergara Villegas. "Driving Maximal Frequency Content and Natural Slopes Sharpening for Image Amplification with High Scale Factor." Current Medical Imaging Formerly Current Medical Imaging Reviews 16, no. 1 (January 6, 2020): 36–49. http://dx.doi.org/10.2174/1573405614666180319160045.

Full text
Abstract:
Background: In this paper, a method for adaptive Pure Interpolation (PI) in the frequency domain, with gradient auto-regularization, is proposed. Methods: The input image is transformed into the frequency domain and convolved with the Fourier Transform (FT) of a 2D sampling array (interpolation kernel) of initial size L × M. The Inverse Fourier Transform (IFT) is applied to the output coefficients and the edges are detected and counted. To get a denser kernel, the sampling array is interpolated in the frequency domain and convolved again with the transform coefficients of the original image of low resolution and transformed back into the spatial domain. The process is repeated until a maximum number of edges is reached in the output image, indicating that a locally optimal magnification factor has been attained. Finally, a maximum ascend–descend gradient auto-regularization method is designed and the edges are sharpened. Results: For the gradient management, a new strategy is proposed, referred to as the Natural bi- Directional Gradient Field (NBGF). It uses a natural following of a pair of directional and orthogonal gradient fields. Conclusion: The proposed procedure is comparable to novel algorithms reported in the state of the art with good results for high scales of amplification.
APA, Harvard, Vancouver, ISO, and other styles
48

Curiel, D. T., C. Vogelmeier, R. C. Hubbard, L. E. Stier, and R. G. Crystal. "Molecular basis of alpha 1-antitrypsin deficiency and emphysema associated with the alpha 1-antitrypsin Mmineral springs allele." Molecular and Cellular Biology 10, no. 1 (January 1990): 47–56. http://dx.doi.org/10.1128/mcb.10.1.47.

Full text
Abstract:
The Mmineral springs alpha 1-antitrypsin (alpha 1AT) allele, causing alpha 1AT deficiency and emphysema, is unique among the alpha 1AT-deficiency alleles in that it was observed in a black family, whereas most mutations causing alpha 1AT deficiency are confined to Caucasian populations of European descent. Immobilized pH gradient analysis of serum demonstrated that alpha 1AT Mmineral springs migrated cathodal to the normal M2 allele. Evaluation of Mmineral springs alpha 1AT as an inhibitor of neutrophil elastase, its natural substrate, demonstrated markedly lower than normal function. Characterization of the alpha 1AT Mmineral springs gene demonstrated that it differed from the common normal M1(Ala213) allele by a single-base substitution causing the amino acid substitution Gly-67 (GGG)----Glu-67 (GAG). Capitalizing on the fact that this mutation creates a polymorphism for the restriction endonuclease AvaII, family analysis demonstrated that the Mmineral springs alpha 1AT allele was transmitted in an autosomal-codominant fashion. Evaluation of genomic DNA showed that the index case was homozygous for the alpha 1AT Mmineral springs allele. Cytoplasmic blot analysis of blood monocytes of the Mmineral springs homozygote demonstrated levels of alpha 1AT mRNA transcripts comparable to those in cells of a normal M1 (Val213) homozygote control. Evaluation of in vitro translation of Mmineral springs alpha 1AT mRNA transcripts demonstrated a normal capacity to direct the translation of alpha 1AT. Evaluation of secretion of alpha 1AT by the blood monocytes by pulse-chase labeling with [35S]methionine, however, demonstrated less secretion by the Mmineral springs cells than normal cells. To characterize the posttranslational events causing the alpha 1AT-secretory defect associated with the alpha 1AT Mmineral springs gene, retroviral gene transfer was used to establish polyclonal populations of murine fibroblasts containing either a normal human M1 alpha 1AT cDNA or an Mmineral springs alpha 1AT cDNA and expressing comparable levels of human alpha 1AT mRNA transcripts. Pulse-chase labeling of these cells with [35S]methionine demonstrated less secretion of human alpha 1AT from the Mmineral springs cells than from the M1 cells, and evaluation of cell lysates also demonstrated lower amounts of intracellular human alpha 1AT in the Mmineral springs cells than in the normal M1 control cells. Thus, the Gly-67 --> Glu mutation that characterizes Mmineral springs causes reduced alpha 1AT secretion on the basis of aberrant posttranslational alpha 1AT biosynthesis by a mechanism distinct from that associated with the alpha 1AT Z allele, whereby intracellular aggregation of the mutant protein is etiologic of the alpha 1AT-secretory defect. Furthermore, for the alpha 1AT protein that does reach the circulation, this mutation markedly affects the ability of the molecule to inhibit neutrophil elastase; i.e., the alpha 1AT Mmineral springs allele predisposes to emphysema on the basis of serum apha 1AT deficiency coupled with alpha AT dysfunction.
APA, Harvard, Vancouver, ISO, and other styles
49

Curiel, D. T., C. Vogelmeier, R. C. Hubbard, L. E. Stier, and R. G. Crystal. "Molecular basis of alpha 1-antitrypsin deficiency and emphysema associated with the alpha 1-antitrypsin Mmineral springs allele." Molecular and Cellular Biology 10, no. 1 (January 1990): 47–56. http://dx.doi.org/10.1128/mcb.10.1.47-56.1990.

Full text
Abstract:
The Mmineral springs alpha 1-antitrypsin (alpha 1AT) allele, causing alpha 1AT deficiency and emphysema, is unique among the alpha 1AT-deficiency alleles in that it was observed in a black family, whereas most mutations causing alpha 1AT deficiency are confined to Caucasian populations of European descent. Immobilized pH gradient analysis of serum demonstrated that alpha 1AT Mmineral springs migrated cathodal to the normal M2 allele. Evaluation of Mmineral springs alpha 1AT as an inhibitor of neutrophil elastase, its natural substrate, demonstrated markedly lower than normal function. Characterization of the alpha 1AT Mmineral springs gene demonstrated that it differed from the common normal M1(Ala213) allele by a single-base substitution causing the amino acid substitution Gly-67 (GGG)----Glu-67 (GAG). Capitalizing on the fact that this mutation creates a polymorphism for the restriction endonuclease AvaII, family analysis demonstrated that the Mmineral springs alpha 1AT allele was transmitted in an autosomal-codominant fashion. Evaluation of genomic DNA showed that the index case was homozygous for the alpha 1AT Mmineral springs allele. Cytoplasmic blot analysis of blood monocytes of the Mmineral springs homozygote demonstrated levels of alpha 1AT mRNA transcripts comparable to those in cells of a normal M1 (Val213) homozygote control. Evaluation of in vitro translation of Mmineral springs alpha 1AT mRNA transcripts demonstrated a normal capacity to direct the translation of alpha 1AT. Evaluation of secretion of alpha 1AT by the blood monocytes by pulse-chase labeling with [35S]methionine, however, demonstrated less secretion by the Mmineral springs cells than normal cells. To characterize the posttranslational events causing the alpha 1AT-secretory defect associated with the alpha 1AT Mmineral springs gene, retroviral gene transfer was used to establish polyclonal populations of murine fibroblasts containing either a normal human M1 alpha 1AT cDNA or an Mmineral springs alpha 1AT cDNA and expressing comparable levels of human alpha 1AT mRNA transcripts. Pulse-chase labeling of these cells with [35S]methionine demonstrated less secretion of human alpha 1AT from the Mmineral springs cells than from the M1 cells, and evaluation of cell lysates also demonstrated lower amounts of intracellular human alpha 1AT in the Mmineral springs cells than in the normal M1 control cells. Thus, the Gly-67 --> Glu mutation that characterizes Mmineral springs causes reduced alpha 1AT secretion on the basis of aberrant posttranslational alpha 1AT biosynthesis by a mechanism distinct from that associated with the alpha 1AT Z allele, whereby intracellular aggregation of the mutant protein is etiologic of the alpha 1AT-secretory defect. Furthermore, for the alpha 1AT protein that does reach the circulation, this mutation markedly affects the ability of the molecule to inhibit neutrophil elastase; i.e., the alpha 1AT Mmineral springs allele predisposes to emphysema on the basis of serum apha 1AT deficiency coupled with alpha AT dysfunction.
APA, Harvard, Vancouver, ISO, and other styles
50

Nitta, Tohru. "Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks." International Journal of Advanced Computer Science and Applications 5, no. 7 (2014). http://dx.doi.org/10.14569/ijacsa.2014.050729.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography