------------------------------------------------------------------------ From dalia@astro.ox.ac.uk Mon Apr 9 19:04:21 2001 Date: Mon, 9 Apr 2001 18:04:12 +0100 (BST) From: Dalia Chakrabarty X-Sender: dalia@fomalhaut.physics.ox.ac.uk To: gcnews@aoc.nrao.edu Subject: Submitting GC Paper MIME-Version: 1.0 X-Status: X-Keywords: X-UID: 8 %astro-ph/0103467 \documentclass[preprint]{aastex} \renewcommand{\theenumi}{\Roman{enumi}} \newcommand{\iint}{\int\!\!\int} \def\<#1>{\langle#1\rangle} \def\pcite#1{[ref]} \def\comment#1{$\bf\{${\small#1}$\bf\}$} \slugcomment{Revised: Mar 13, 2001} \shortauthors{D. Chakrabarty \& P. Saha} \shorttitle{Galactic Center Black Hole} \begin{document} \title{A Non-Parametric Estimate\\ of the Mass of the Central Black Hole\\ in the Galaxy} \author{Dalia Chakrabarty\altaffilmark{1}} \affil{Department of Physics (Astrophysics)\\ University of Oxford\\ Keble Road, Oxford OX1~3RH, UK} \altaffiltext{1}{Department of Physics and Astronomy\\ University of Leicester\\ University Road, Leicester LE1~3RH, UK} \and \author{Prasenjit Saha} \affil{Astronomy Unit, School of Mathematical Sciences\\ Queen Mary and Westfield College\\ London E1~4NS, UK} \begin{abstract} We estimate the mass of the central black hole in our Galaxy from stellar kinematical data published by \citet{ghez} and \citet{gerhard}. For this we develop a method, related to \citet{david}, for non-parametrically reconstructing the mass profile and the stellar distribution function in the central region of the Galaxy from discrete kinematic data, including velocity errors. Models are searched using the Metropolis algorithm. We assume that the mass distribution is spherical and the stellar velocity distribution is isotropic, and devise a test of this assumption. From proper motions we obtain an enclosed mass of $2.0\pm{0.7}\times10^6{\rm M}_{\odot}$ within the inner $0.0044\rm pc$, from radial velocities we obtain a mass of $2.2^{+1.6}_{-1.0}\times10^6{\rm M}_{\odot}$ within $0.046$pc and from three-dimensional velocities we obtain $1.8^{+0.4}_{-0.3}\times10^6{\rm M}_{\odot}$ within $0.046$pc. \end{abstract} \keywords{Galaxy: center; Galaxy: kinematics and dynamics} \section{Introduction} The center of our Galaxy has been extensively studied at wavelengths between the near infrared and radio regions in the electromagnetic spectrum, \citep{sandqvist}. The dynamical center of our Galaxy is considered to be coincident with a compact synchrotron radio source, Sgr~A$^*$. Sgr~A$^*$ is associated with a black hole of mass of the order of 10$^6{\rm M}_{\odot}$, \citep{bower}. Kinematics in the neighborhood of this black hole would provide a direct estimate of the mass interior to the orbit. With this in mind, observational studies of gas and stellar motions in the very central regions of our Galaxy have been undertaken over the last few years. This paper develops a new non-parametric method for mapping the distribution function and the underlying potential of a spherical isotropic stellar system, from discrete kinematic data, and applies it to published radial velocity and proper motion data in the Galactic center region to re-estimate the mass of the central black hole. \section{Observational studies} The notion that our own Galaxy could harbor a massive black hole at its center was alluded to by theories that suggested the presence of central massive black holes in galactic nuclei. Such a hypothesis regarding the Milky Way gained stronger footing with the availability of favorable observational evidence, starting with the report on Ne II $12.8\mu$m emitting high velocity ionized gas at the Galactic center by \citet{wollman}. An upper limit of $4\times10^6{\rm M}_{\odot}$ was imposed on the mass enclosed within a radius of $0.8$pc from the Galactic center. Similar observations of the $12.8\mu$m Ne II emission from the Galactic center were reported in subsequent years in \citet{lacy1} and \citet{lacy2}. Later it was realized that the emission was from large-scale flows of ionized gas, \citep{lo,serabynlacy,genzel85,gusten}. The streamer velocity and position could be fitted by orbital motions. Such fits indicated the presence of a mass of $\approx{3}\times10^6{\rm M}_{\odot}$ within the inner $0.5$pc. However stellar motions were still needed to get a more reliable estimate of the mass in the central regions in the Galaxy since gas dynamics can be dictated by factors other than gravity. Observations of the early type stars that form a smaller cluster found in the central $10''$ have also been made. Such an observational programme led \citet{krabbe95} to predict the existence of a central mass of $2-4\times10^6{\rm M}_{\odot}$ within the central $0.15$pc. \citet{genzel1} reported the results of a radial motion study of $222$ stars, within the central 1pc of the dynamical center of our Galaxy. This study predicted a central core of radius $\leqslant{0.06}$pc and mass 2.2--3.2$\times10^6{\rm M}_{\odot}$. The work done in \citet{genzel2}, was the first study of proper motions of stars in the Galactic center. Combining the radial motion data of \citet{genzel1} and the proper motion data from \citet{genzel2}, Genzel and Eckart reached the conclusion that the central core of our Galaxy has a mass of $2.45\pm{0.4}\times10^6{\rm M}_{\odot}$ and lies within $0.015$pc of Sgr~A$^*$. \citet{ghez} estimated the mass of the central dark core of the Milky Way to be $2.6\pm{0.2}\times10^6{\rm M}_{\odot}$ and its volume to be $\leqslant10^{-6}$pc$^3$. This conclusion was reached on the basis of the proper motion data of 90 stars in their sample. The latest report of stellar dynamics in the Galactic center is in \citet{gerhard}. In this analysis, anisotropy of stellar motions is considered for the first time. The authors conclude that while isotropy is a good assumption, there are distinct groups of stars that display significant anisotropy in their motions. They estimate a central mass of $2.6-3.3\times10^6{\rm M}_{\odot}$ depending on the modeling of their data. \subsection{Radial and proper motion study by Genzel et al.\ and Eckart \& Genzel} \citet{genzel1} observed radial velocities of 223 stars within $\approx0.1$pc. \citet{genzel2} reported proper motions of stars in the central gravitational field of the Galaxy. The observational programme spanned over a total of four years, at a resolution of $\approx0.15''$. It was found that the velocity dispersions along the three mutually orthogonal spatial axes were very similar, implying a very low degree of central anisotropy. A strong Keplerian fall-off of velocity dispersion with projected radius [$\sigma(r_p)\propto r_p^{-0.5}$] was established, thus strongly implying the existence of a central black hole. This provided a strong physical motivation to model the center as a point mass and the velocity field as isotropic. The mass estimate of the central core of the Galaxy could then be found by fitting the parameters of the model to the radial and proper motion data. A simple virial estimate was supplemented by a \citet{BT} mass estimate for the Galactic core. For this latter scheme the point mass was considered to be embedded in an isotropic cluster of dispersions of the order of $50$km/s at infinity. Virial and Bahcall-Tremaine mass estimators gave similar values ($2.12\pm{0.7}\times{10}^{6}{\rm M}_{\odot}$) for the mass within a projected radius of $0.24$pc. Using the Jeans equations with the radial and proper motion data from \citet{genzel1} and \citet{genzel2} implied the existence of a central mass of $2.4\times10^6{\rm M}_{\odot}$ with a total (systematic and fitting) uncertainty of about $\pm0.64\times10^6{\rm M}_{\odot}$. This mass is considered to be concentrated within a radius of 0.015$\pm$0.02pc east of Sgr~A$^*$. \subsection{Proper motion study by Ghez et al.} In this work Ghez and co-workers observed proper motions of the central cluster of the Galaxy. They followed the time evolution of the positions of $90$ stars over a period of two years at a resolution of $\approx0.05''$. The radial velocities were measured for stars at the largest radii only, ($\approx0.1$pc). These few values of the line-of-sight velocity dispersion, when compared to the projected radial and tangential velocity dispersions indicated a good degree of isotropy in the central regions. Modeling the center as a point mass, they estimated a central mass of $(2.6\pm{0.2})\times10^6{\rm M}_{\odot}$ interior to a radius of $0.015$pc. \subsection{Anisotropy incorporated analysis of stellar dynamics by Genzel et al.} In \citet{gerhard}, the data set used includes line of sight velocity measurements reported earlier in \citet{genzel1}, improved values of proper motions reported in \citet{genzel2}, and proper motions reported in \citet{ghez}. This ``homogenized best'' data set includes proper motions of $104$ stars, with their errors as well as 227 radial velocities and errors in these line-of-sight (LOS) velocities. For 32 stars both sky and line-of-sight velocities are reported. The assumption of isotropy is concluded to be broadly good, though motions of individual groups of stars are spotted to show significant anisotropy, (examples of both radial and tangential anisotropy have been spotted). The whole sample of stars is divided into $7$ annuli and the projected radial and tangential velocity dispersions corresponding to each annulus are calculated. The global expectations of these projected velocity dispersions can be related to the anisotropy parameter, $\beta$. Thus for each radial annulus, the anisotropy parameter is obtained. The significance of the observed anisotropy is checked against data generated by Monte Carlo clusters defined by simple anisotropic distribution functions. Using the value of $\beta$ for each annulus, \citet{LM}, Bahcall-Tremaine and virial mass estimates are made for the very central region, the respective results being $2.9\pm0.4\times10^6{\rm M}_{\odot}$, $3.1\pm0.3\times10^6{\rm M}_{\odot}$ and $2.5-2.6\times10^6{\rm M}_{\odot}$. \section{Motivation for a non-parametric scheme} \label{sec:motivation} The observational works discussed above obtain the mass in the central region of the Galaxy using either the Jeans equation or projected mass estimators like Bahcall-Tremaine, Leonard-Merritt, or the virial theorem. The shortcomings of these methods are as follows. \begin{itemize} \item The implementation of Jeans equation requires smooth approximations to the projected dispersions and the surface density, that can be subsequently related to the total dispersion and the spatial density via the Abel integral equation. This suffers from the problem of shot noise in radial bins, which tends to make the Abel inversion unstable \citep{genzel1}. In addition, the Jeans equations do not constrain the distribution function to be non-negative. \item The mass estimators are derived by taking moments of the Jeans equations, and relate $\langle r^nM(r)\rangle$ (where $M(r)$ is the enclosed mass, $n$ is an integer, and the average is over the stellar density) to observed velocity dispersions. Being moments, they contain only limited information about the mass distribution. Moreover, for $n=0$ (Bahcall-Tremaine and Leonard-Merritt estimators) stars far from the center get more weight, which is undesirable when estimating a black hole mass. Also, kinematic data sets in practice sample only part of the potential, so some modeling or extrapolation is necessary. \end{itemize} A non-parametric method, which can reconstruct both density and distribution function without assuming functional forms for either, would avoid all these problems. \section{Practical algorithms used to solve the spherical inverse problem} \label{sec:algorithm} Here we present a scheme for studying the spherical inverse problem under the assumption of isotropy. This algorithm is inspired by the work in \citet{david}, but differs in some important ways: it uses both radial velocity and proper motion data rather than just the former, and uses the Metropolis algorithm as distinguished from maximum penalized likelihood. These differences are discussed in detail in the next section. The scheme reported in \citet{david} is a generic algorithm that does not involve the assumption of isotropy; a simple anisotropic distribution function that depends on both energy and the line-of-sight component of the angular momentum is considered. However, in this report, we solve the simpler case of the isotropic distribution function. (We buttress our analysis with a check of consistency of the data with isotropy). The algorithm in \citet{david}, which uses discrete data, is discussed below. \subsection{A generic non-parametric method} Let a stellar system be represented by an equilibrium distribution function (DF), $f$, which is a function of energy $E$ and angular momentum, $L$. One way of projecting $f$ into the observable space is by first projecting it on the plane of the sky and then along the line of sight velocity axis in the phase space. The resulting projected DF will then be dependent on the plane of sky coordinates and also on the LOS velocity coordinate. This projected DF, say $\nu_p$, is related to $f$ via the integral equation \begin{equation} \nu_p(x,y,v_p)=\int{dz\iint{dv_xdv_xf(E,L)}}, \label{eq:integral} \end{equation} where the LOS is along the $z$-direction while the plane of the sky is scanned by the $x$-$y$ plane. If the stellar system being studied exhibits spherical geometry, then the last equation becomes \begin{equation} \nu_p(r_p,v_p)=\int{dz\iint{dv_xdv_yf[v_r^2+v_t^2+2\Phi(r),L]}}. \label{eq:proj} \end{equation} Here $r$ is the spherical radius and $r_p$ the cylindrical radius on the plane of the sky, so that $r_p^2=x^2+y^2=r^2-z^2$; $v_r$ and $v_t$ are components parallel and tangential to the radius ${\bf{r}}$; $\Phi(r)$ is the gravitational potential of this system. To simplify matters we could choose the $y$-axis to lie completely in the plane containing the radius vector ${\bf{r}}$ and the LOS. This implies that $v_r^2=(v_y\sin\theta)^2+(v_z\cos\theta)^2$, $\theta$ being the polar angle, i.e., $\cos\theta=z/r$. A non-parametric solution determines the potential and the distribution function by searching for a pair of functions $f$ and $\Phi$ that provide the best-fit to the complete velocity data. The first step is therefore to identify a way by which a trial DF could be projected into observable space, according to eqn.~\ref{eq:proj}. An easy way to do this would be to approximate $f(E,L)$ as a 2-D histogram. Thus $f$ is approximated to be a constant over any integral cell defined around a pair of $E$ and $L$ values. If, of course, the assumption of isotropy is invoked, the integral space becomes one dimensional so that $f$ is a function of $E$ only. $f$ is then approximated to be a constant over any energy bin in this integral space. The contribution to the projected distribution function from each such energy bin is given by \begin{equation} \nu_p^{\rm cell}(r_p,v_p)=\int{dz\, A(r,r_p,v_p)}, \end{equation} where $A(r,r_p,v_p)=\iint{dv_xdv_y}$ is the area that the energy integral bin occupies in the $(v_x,v_y)$ space. The total projected distribution function is a simple sum over all the energy bins, i.e., \begin{equation} \nu_p(r_p,v_p)=\sum_{i}f_i\nu_{pi}^{\rm cell}(r_p,v_p). \label{eq:invert} \end{equation} To get the total distribution function $f$ from the projected DF, eqn.~\ref{eq:invert} has to be inverted, (eqn.~\ref{eq:invert} can be treated as a set of linear equations). If $\nu_p(r_p,v_p)$ is a continuous function of $r_p$ and $v_p$, the inversion is straightforward but if the data set consists of a large number of discrete data points, then the projected DF can be estimated by binning the data with respect to $r_p$ and $v_p$. The probability that the observed kinematical data was drawn from $f(E)$ could be measured by the likelihood function ($L$) that can be defined as the product of all the projected distribution functions, each obtained for a pair of $(r_p,v_p)$ \begin{equation} \log(L) = \sum_{\rm{i=1}}^{\rm{N}}\nu_{pi}, \end{equation} where $N$ is the total number of pairs of phase space points in the observed data set and $\nu_{pi}$ is the projected distribution function corresponding to the $i^{\rm th}$ pair of apparent position and velocity in the data set. The optimization can be carried out in the presence of linear constraints on the total number of stars, presented as a penalty function. Thus in \citet{david}, the maximum penalized likelihood was employed. \subsection{Unique features of our algorithm} Our method is essentially an isotropic version of the above, but there are a few important differences between the two algorithms, as discussed below. \begin{enumerate} \item We derive the potential from a discretized density distribution. The mass (including any dark matter) is assumed to be in spherical shells with density decreasing outwards but otherwise free-form. \item As discussed in the last section, in \citet{david}, a maximum penalized likelihood was suggested to find out the distribution function that the observed data was most likely to have been drawn from. When the likelihood is highly non-linear, it cannot be optimized by linear constraints or even by quadratic programming, (\citet{david96}). In this case, the maxima in the likelihood can be found by searching for it using brute force (\citet{davidsaha}). However this method is rather unsatisfactory in terms of probability of success and required CPU time. We use the Metropolis algorithm, and attempt not only to recover the model with the maximum likelihood, but to generate an {\it ensemble} of models distributed according to the likelihood. This ensemble will of course be dominated by models close to the maximum likelihood model. The advantage of having such an ensemble is that we can estimate uncertainties simply by measuring the spread of values across the ensemble. Moreover, we can use discrete data without ever having to bin them. \item Since we have both radial and proper motion data, we realized that we could project the distribution function into observable space in three ways. Firstly we could project the DF on the plane of the sky and then project it along the line-of-sight velocity axis in the phase space. This would give the projected DF, $\nu_p$ which will have dependence on the plane of sky coordinates and the line of sight velocity coordinate. We could also project the DF first on the plane of the sky and then on the plane which is perpendicular to the line of sight velocity axis in the velocity space. The resulting projected distribution function is then dependent on the plane of sky coordinates and also on the velocity coordinates perpendicular to the line of sight velocity coordinate. We name such a projected distribution function $\nu_{\perp}$. There is a small sample of stars for which we know both radial and transverse velocities. This allows us to project the distribution function along each of the three velocity axes of the phase space, subsequent to the projection on the plane of the sky. The resulting projected distribution function will be dependent on the apparent position on the sky and the radial and tangential velocities. We call such a projected distribution function $\nu_{\rm 3D}$. \item We incorporate the effects of the errors in the velocity measurements by convolving the calculated projected distribution functions with error distributions that are assumed to be Gaussian. \item It is rightly argued that finding the data to be inconsistent with anisotropy is a surer check of the assumption of an isotropic distribution function than checking for the consistency of the data with isotropy. But the former is a much harder test to execute. Hence we carry out our analysis by starting with the assumption of isotropy and at the end of analysis, perform a goodness-of-fit test to check for the validity of the assumption for the observed data sets. \end{enumerate} \subsection{The Metropolis algorithm} \label{sec:metropolis} The algorithm used in our code to identify the distribution of likelihoods is the Metropolis algorithm, \citep{metrop}. The likelihood function $L$ is a function of the potential and of the distribution function histograms, i.e., $L$ is a function in $n$-dimensional space, where $n$ is the number of energy bins in the DF histogram times the number of mass shells. Let us represent the likelihood as $L = L({\bf x})$ where ${\bf x}\equiv(\rho,f)$. The code samples the function $L({\bf x})$ through a series of iterations $L({\bf x}_n)$. Two consecutive iterations are related by a transition probability $p({\bf x}_n\rightarrow{\bf x}_{n+1})$. The transition probabilities are chosen as \begin{equation} p({\bf x}\rightarrow{\bf x'}) = {\rm max}\left( {L({\bf x'})\over L({\bf x})},1\right), \end{equation} although in principle any choice satisfying detailed balance \begin{equation} L({\bf x'})p({\bf x'}\rightarrow{\bf x}) = L({\bf x})p({\bf x}\rightarrow{\bf x'}) \end{equation} would serve. \begin{figure} \plotone{dcps.fig1.ps} %\caption{Schematic diagram to illustrate the Metropolis algorithm. %The code starts at ${\bf x}_0$, and proceeds to higher $L$ at ${\bf %x}_1$. A trial ${\bf x}_2$ at lower $L$ is rejected, and so ${\bf %x}_1$ gets repeated as ${\bf x}_2$. Next, the code goes to ${\bf %x}_3$, also at lower $L$, and then on to ${\bf x}_4$. The code %wanders around near the global maximum of $L$, the sampling being %proportional to $L$.} %\label{fig:metrofig} \end{figure} The algorithm is schematically represented in Figure~\ref{fig:metrofig}. At each iteration, the code tries a new ${\bf x}$, i.e., new $\rho$ and $f$. These histograms are always non-negative and monotonic, and with unit normalization for $f$, but otherwise arbitrary. The trial ${\bf x}$ is accepted or rejected with the transition probability; in case of rejection the code keeps the old ${\bf x}$ for the next iteration. In its early phase Metropolis searches out the region around the maxima in the likelihood and then the more ``equilibrium'' phase sets in, when Metropolis wanders about in the region of the multi-dimensional energy-density space, where the likelihood is close to maximal. The extent of this wandering provides the uncertainties in the mass estimates. As the iterations proceed, the likelihood is monitored. Initially, there is trend towards increasing likelihood. When this trend disappears, Metropolis is in the equilibrium phase. In practice we waited till iteration $k$ such that $L({\bf x}_k)