------------------------------------------------------------------------
From dalia@astro.ox.ac.uk Mon Apr 9 19:04:21 2001
Date: Mon, 9 Apr 2001 18:04:12 +0100 (BST)
From: Dalia Chakrabarty
X-Sender: dalia@fomalhaut.physics.ox.ac.uk
To: gcnews@aoc.nrao.edu
Subject: Submitting GC Paper
MIME-Version: 1.0
X-Status:
X-Keywords:
X-UID: 8
%astro-ph/0103467
\documentclass[preprint]{aastex}
\renewcommand{\theenumi}{\Roman{enumi}}
\newcommand{\iint}{\int\!\!\int}
\def\<#1>{\langle#1\rangle}
\def\pcite#1{[ref]}
\def\comment#1{$\bf\{${\small#1}$\bf\}$}
\slugcomment{Revised: Mar 13, 2001}
\shortauthors{D. Chakrabarty \& P. Saha}
\shorttitle{Galactic Center Black Hole}
\begin{document}
\title{A Non-Parametric Estimate\\
of the Mass of the Central Black Hole\\
in the Galaxy}
\author{Dalia Chakrabarty\altaffilmark{1}}
\affil{Department of Physics (Astrophysics)\\
University of Oxford\\
Keble Road, Oxford OX1~3RH, UK}
\altaffiltext{1}{Department of Physics and Astronomy\\
University of Leicester\\
University Road, Leicester LE1~3RH, UK}
\and
\author{Prasenjit Saha}
\affil{Astronomy Unit, School of Mathematical Sciences\\
Queen Mary and Westfield College\\
London E1~4NS, UK}
\begin{abstract}
We estimate the mass of the central black hole in our Galaxy from
stellar kinematical data published by \citet{ghez} and
\citet{gerhard}. For this we develop a method, related to
\citet{david}, for non-parametrically reconstructing the mass profile
and the stellar distribution function in the central region of the
Galaxy from discrete kinematic data, including velocity errors.
Models are searched using the Metropolis algorithm. We assume that the
mass distribution is spherical and the stellar velocity distribution
is isotropic, and devise a test of this assumption. From proper
motions we obtain an enclosed mass of $2.0\pm{0.7}\times10^6{\rm
M}_{\odot}$ within the inner $0.0044\rm pc$, from radial velocities we
obtain a mass of $2.2^{+1.6}_{-1.0}\times10^6{\rm M}_{\odot}$ within
$0.046$pc and from three-dimensional velocities we obtain
$1.8^{+0.4}_{-0.3}\times10^6{\rm M}_{\odot}$ within $0.046$pc.
\end{abstract}
\keywords{Galaxy: center; Galaxy: kinematics and dynamics}
\section{Introduction}
The center of our Galaxy has been extensively studied at wavelengths
between the near infrared and radio regions in the electromagnetic
spectrum, \citep{sandqvist}. The dynamical center of our Galaxy is
considered to be coincident with a compact synchrotron radio source,
Sgr~A$^*$. Sgr~A$^*$ is associated with a black hole of mass of the
order of 10$^6{\rm M}_{\odot}$, \citep{bower}. Kinematics in the
neighborhood of this black hole would provide a direct estimate of the
mass interior to the orbit. With this in mind, observational studies
of gas and stellar motions in the very central regions of our Galaxy
have been undertaken over the last few years.
This paper develops a new non-parametric method for mapping the
distribution function and the underlying potential of a spherical
isotropic stellar system, from discrete kinematic data, and applies it
to published radial velocity and proper motion data in the Galactic
center region to re-estimate the mass of the central black hole.
\section{Observational studies}
The notion that our own Galaxy could harbor a massive black hole at
its center was alluded to by theories that suggested the presence of
central massive black holes in galactic nuclei. Such a
hypothesis regarding the Milky Way gained stronger footing with the
availability of favorable observational evidence, starting with the
report on Ne II $12.8\mu$m emitting high velocity ionized gas at
the Galactic center by \citet{wollman}. An upper limit of
$4\times10^6{\rm M}_{\odot}$ was imposed on the mass enclosed within a
radius of $0.8$pc from the Galactic center. Similar observations of
the $12.8\mu$m Ne II emission from the Galactic center were
reported in subsequent years in \citet{lacy1} and
\citet{lacy2}. Later it was realized that the emission was from
large-scale flows of ionized gas, \citep{lo,serabynlacy,genzel85,gusten}.
The streamer velocity and position
could be fitted by orbital motions. Such fits indicated the presence
of a mass of $\approx{3}\times10^6{\rm M}_{\odot}$ within the inner
$0.5$pc.
However stellar motions were still needed to get a more reliable
estimate of the mass in the central regions in the Galaxy since gas
dynamics can be dictated by factors other than gravity. Observations
of the early type stars that form a smaller cluster found in the
central $10''$ have also been made. Such an observational programme
led \citet{krabbe95} to predict the existence of a central mass of
$2-4\times10^6{\rm M}_{\odot}$ within the central $0.15$pc.
\citet{genzel1} reported the results of a radial motion study of $222$
stars, within the central 1pc of the dynamical center of our Galaxy.
This study predicted a central core of radius $\leqslant{0.06}$pc and
mass 2.2--3.2$\times10^6{\rm M}_{\odot}$. The work done in
\citet{genzel2}, was the first study of proper motions of stars in the
Galactic center. Combining the radial motion data of \citet{genzel1}
and the proper motion data from \citet{genzel2}, Genzel and Eckart
reached the conclusion that the central core of our Galaxy has a mass
of $2.45\pm{0.4}\times10^6{\rm M}_{\odot}$ and lies within $0.015$pc
of Sgr~A$^*$. \citet{ghez} estimated the mass of the central dark core
of the Milky Way to be $2.6\pm{0.2}\times10^6{\rm M}_{\odot}$ and its
volume to be $\leqslant10^{-6}$pc$^3$. This conclusion was reached on
the basis of the proper motion data of 90 stars in their sample. The
latest report of stellar dynamics in the Galactic center is in
\citet{gerhard}. In this analysis, anisotropy of stellar motions is
considered for the first time. The authors conclude that while
isotropy is a good assumption, there are distinct groups of stars that
display significant anisotropy in their motions. They estimate a
central mass of $2.6-3.3\times10^6{\rm M}_{\odot}$ depending on the
modeling of their data.
\subsection{Radial and proper motion study by Genzel et al.\
and Eckart \& Genzel}
\citet{genzel1} observed radial velocities of 223 stars within
$\approx0.1$pc. \citet{genzel2} reported proper motions of stars in
the central gravitational field of the Galaxy. The observational programme
spanned over a total of four years, at a resolution of $\approx0.15''$.
It was found that the velocity dispersions along the three mutually
orthogonal spatial axes were very similar, implying a very low degree
of central anisotropy.
A strong Keplerian fall-off of velocity dispersion with projected
radius [$\sigma(r_p)\propto r_p^{-0.5}$] was established, thus
strongly implying the existence of a central black hole. This
provided a strong physical motivation to model the center as a point
mass and the velocity field as isotropic. The mass estimate of the
central core of the Galaxy could then be found by fitting the
parameters of the model to the radial and proper motion data. A
simple virial estimate was supplemented by a \citet{BT} mass estimate
for the Galactic core. For this latter scheme the point mass was
considered to be embedded in an isotropic cluster of dispersions of
the order of $50$km/s at infinity. Virial and Bahcall-Tremaine mass
estimators gave similar values ($2.12\pm{0.7}\times{10}^{6}{\rm
M}_{\odot}$) for the mass within a projected radius of $0.24$pc. Using
the Jeans equations with the radial and proper motion data from
\citet{genzel1} and \citet{genzel2} implied the existence of a central
mass of $2.4\times10^6{\rm M}_{\odot}$ with a total (systematic and
fitting) uncertainty of about $\pm0.64\times10^6{\rm M}_{\odot}$. This
mass is considered to be concentrated within a radius of
0.015$\pm$0.02pc east of Sgr~A$^*$.
\subsection{Proper motion study by Ghez et al.}
In this work Ghez and co-workers observed proper motions of the
central cluster of the Galaxy. They followed the time evolution of the
positions of $90$ stars over a period of two years at a resolution of
$\approx0.05''$. The radial velocities were measured for stars at the
largest radii only, ($\approx0.1$pc). These few values of the
line-of-sight velocity dispersion, when compared to the projected
radial and tangential velocity dispersions indicated a good degree of
isotropy in the central regions. Modeling the center as a point mass,
they estimated a central mass of $(2.6\pm{0.2})\times10^6{\rm
M}_{\odot}$ interior to a radius of $0.015$pc.
\subsection{Anisotropy incorporated analysis of stellar dynamics by
Genzel et al.} In \citet{gerhard}, the data set used includes line of sight
velocity measurements reported earlier in \citet{genzel1}, improved values
of proper motions reported in \citet{genzel2}, and proper motions reported in
\citet{ghez}. This ``homogenized best'' data set includes proper motions
of $104$ stars, with their errors as well as 227 radial velocities and
errors in these line-of-sight (LOS) velocities. For 32 stars both sky and
line-of-sight velocities are reported. The assumption of isotropy is
concluded to be broadly good, though motions of individual groups of
stars are spotted to show significant anisotropy, (examples of both
radial and tangential anisotropy have been spotted). The whole sample
of stars is divided into $7$ annuli and the projected radial and
tangential velocity dispersions corresponding to each annulus are
calculated. The global expectations of these projected velocity
dispersions can be related to the anisotropy parameter, $\beta$. Thus
for each radial annulus, the anisotropy parameter is obtained. The
significance of the observed anisotropy is checked against data
generated by Monte Carlo clusters defined by simple anisotropic
distribution functions. Using the value of $\beta$ for each
annulus, \citet{LM}, Bahcall-Tremaine and virial mass estimates are
made for the very central region, the respective results being
$2.9\pm0.4\times10^6{\rm M}_{\odot}$, $3.1\pm0.3\times10^6{\rm
M}_{\odot}$ and $2.5-2.6\times10^6{\rm M}_{\odot}$.
\section{Motivation for a non-parametric scheme}
\label{sec:motivation}
The observational works discussed above obtain the mass
in the central region of the Galaxy using either the Jeans equation or
projected mass estimators like Bahcall-Tremaine, Leonard-Merritt, or
the virial theorem. The shortcomings of these methods are as follows.
\begin{itemize}
\item The implementation of Jeans equation requires smooth
approximations to the projected dispersions and the surface density,
that can be subsequently related to the total dispersion and the
spatial density via the Abel integral equation. This suffers
from the problem of shot noise in radial bins, which tends to make the
Abel inversion unstable \citep{genzel1}. In addition, the Jeans
equations do not constrain the distribution function to be
non-negative.
\item The mass estimators are derived by taking moments of the Jeans
equations, and relate $\langle r^nM(r)\rangle$ (where $M(r)$ is the
enclosed mass, $n$ is an integer, and the average is over the stellar
density) to observed velocity dispersions. Being moments, they
contain only limited information about the mass distribution.
Moreover, for $n=0$ (Bahcall-Tremaine and Leonard-Merritt estimators)
stars far from the center get more weight, which is undesirable when
estimating a black hole mass. Also, kinematic data sets in practice
sample only part of the potential, so some modeling or extrapolation
is necessary.
\end{itemize}
A non-parametric method, which can reconstruct both density and
distribution function without assuming functional forms for either,
would avoid all these problems.
\section{Practical algorithms used to solve the spherical inverse problem}
\label{sec:algorithm}
Here we present a scheme for studying the spherical inverse problem
under the assumption of isotropy. This algorithm is inspired by the
work in \citet{david}, but differs in some important ways: it uses
both radial velocity and proper motion data rather than just the
former, and uses the Metropolis algorithm as distinguished from
maximum penalized likelihood. These differences are discussed in
detail in the next section. The scheme reported in \citet{david} is a
generic algorithm that does not involve the assumption of isotropy; a
simple anisotropic distribution function that depends on both energy
and the line-of-sight component of the angular momentum is considered.
However, in this report, we solve the simpler case of the isotropic
distribution function. (We buttress our analysis with a check of
consistency of the data with isotropy). The algorithm in
\citet{david}, which uses discrete data, is discussed below.
\subsection{A generic non-parametric method}
Let a stellar system be represented by an equilibrium distribution
function (DF), $f$, which is a function of energy $E$ and angular
momentum, $L$. One way of projecting $f$ into the observable space is
by first projecting it on the plane of the sky and then along the line
of sight velocity axis in the phase space. The resulting projected DF
will then be dependent on the plane of sky coordinates and also on the
LOS velocity coordinate. This projected DF, say $\nu_p$, is related to
$f$ via the integral equation
\begin{equation}
\nu_p(x,y,v_p)=\int{dz\iint{dv_xdv_xf(E,L)}},
\label{eq:integral}
\end{equation}
where the LOS is along the $z$-direction while the plane of the sky is
scanned by the $x$-$y$ plane. If the stellar system being studied
exhibits spherical geometry, then the last equation becomes
\begin{equation}
\nu_p(r_p,v_p)=\int{dz\iint{dv_xdv_yf[v_r^2+v_t^2+2\Phi(r),L]}}.
\label{eq:proj}
\end{equation}
Here $r$ is the spherical radius and $r_p$ the cylindrical radius
on the plane of the sky, so that $r_p^2=x^2+y^2=r^2-z^2$; $v_r$ and
$v_t$ are components parallel and tangential to the radius
${\bf{r}}$; $\Phi(r)$ is the gravitational potential of this system. To
simplify matters we could choose the $y$-axis to lie completely in the
plane containing the radius vector ${\bf{r}}$ and the LOS. This implies
that $v_r^2=(v_y\sin\theta)^2+(v_z\cos\theta)^2$, $\theta$ being the
polar angle, i.e., $\cos\theta=z/r$.
A non-parametric solution determines the potential and the
distribution function by searching for a pair of functions $f$ and $\Phi$
that provide the best-fit to the complete velocity data. The first
step is therefore to identify a way by which a trial DF could be
projected into observable space, according to eqn.~\ref{eq:proj}. An
easy way to do this would be to approximate $f(E,L)$ as a 2-D
histogram.
Thus $f$ is approximated to be a constant over any integral cell
defined around a pair of $E$ and $L$ values. If, of course, the
assumption of isotropy is invoked, the integral space becomes one
dimensional so that $f$ is a function of $E$ only. $f$ is then
approximated to be a constant over any energy bin in this integral
space. The contribution to the projected distribution function from
each such energy bin is given by
\begin{equation}
\nu_p^{\rm cell}(r_p,v_p)=\int{dz\, A(r,r_p,v_p)},
\end{equation}
where $A(r,r_p,v_p)=\iint{dv_xdv_y}$ is the area that the energy
integral bin occupies in the $(v_x,v_y)$ space. The total projected
distribution function is a simple sum over all the energy bins, i.e.,
\begin{equation}
\nu_p(r_p,v_p)=\sum_{i}f_i\nu_{pi}^{\rm cell}(r_p,v_p).
\label{eq:invert}
\end{equation}
To get the total distribution function $f$ from the projected DF,
eqn.~\ref{eq:invert} has to be inverted, (eqn.~\ref{eq:invert} can be treated
as a set of linear equations). If $\nu_p(r_p,v_p)$ is a continuous function
of $r_p$ and $v_p$, the inversion is straightforward but if the data set
consists of a large number of discrete data points, then the projected DF
can be estimated by binning the data with respect to $r_p$ and $v_p$.
The probability that the observed kinematical data was drawn from
$f(E)$ could be measured by the likelihood function ($L$) that
can be defined as the product of all the projected distribution functions,
each obtained for a pair of $(r_p,v_p)$
\begin{equation}
\log(L) = \sum_{\rm{i=1}}^{\rm{N}}\nu_{pi},
\end{equation}
where $N$ is the total number of pairs of phase space points in the
observed data set and $\nu_{pi}$ is the projected distribution
function corresponding to the $i^{\rm th}$ pair of apparent position
and velocity in the data set. The optimization can be carried out in
the presence of linear constraints on the total number of stars,
presented as a penalty function. Thus in
\citet{david}, the maximum penalized likelihood was employed.
\subsection{Unique features of our algorithm}
Our method is essentially an isotropic version of the above,
but there are a few important differences between the two algorithms,
as discussed below.
\begin{enumerate}
\item We derive the potential from a discretized density
distribution. The mass (including any dark matter) is assumed to be
in spherical shells with density decreasing outwards but otherwise free-form.
\item As discussed in the last section, in \citet{david}, a maximum
penalized likelihood was suggested to find out the distribution
function that the observed data was most likely to have been drawn
from. When the likelihood is highly non-linear, it cannot be
optimized by linear constraints or even by quadratic programming,
(\citet{david96}). In this case, the maxima in the likelihood can be
found by searching for it using brute force (\citet{davidsaha}).
However this method is rather unsatisfactory in terms of probability
of success and required CPU time. We use the Metropolis
algorithm, and attempt not only to recover the model with the maximum
likelihood, but to generate an {\it ensemble} of models distributed
according to the likelihood. This ensemble will of course be
dominated by models close to the maximum likelihood model. The
advantage of having such an ensemble is that we can estimate
uncertainties simply by measuring the spread of values across the
ensemble. Moreover, we can use discrete data without ever having to
bin them.
\item Since we have both radial and proper motion data, we realized
that we could project the distribution function into observable space
in three ways. Firstly we could project the DF on the plane of the sky
and then project it along the line-of-sight velocity axis in the phase
space. This would give the projected DF, $\nu_p$ which will have
dependence on the plane of sky coordinates and the line of sight
velocity coordinate. We could also project the DF first on the plane
of the sky and then on the plane which is perpendicular to the line of
sight velocity axis in the velocity space. The resulting projected
distribution function is then dependent on the plane of sky
coordinates and also on the velocity coordinates perpendicular to the
line of sight velocity coordinate. We name such a projected
distribution function $\nu_{\perp}$. There is a small sample of stars
for which we know both radial and transverse velocities. This allows
us to project the distribution function along each of the three
velocity axes of the phase space, subsequent to the projection on the
plane of the sky. The resulting projected distribution function will
be dependent on the apparent position on the sky and the radial and
tangential velocities. We call such a projected distribution function
$\nu_{\rm 3D}$.
\item
We incorporate the effects of the errors in the velocity measurements
by convolving the calculated projected distribution functions with error
distributions that are assumed to be Gaussian.
\item
It is rightly argued that finding the data to be inconsistent with
anisotropy is a surer check of the assumption of an isotropic
distribution function than checking for the consistency of the data
with isotropy. But the former is a much harder test to execute. Hence
we carry out our analysis by starting with the assumption of isotropy
and at the end of analysis, perform a goodness-of-fit test to check
for the validity of the assumption for the observed data sets.
\end{enumerate}
\subsection{The Metropolis algorithm}
\label{sec:metropolis}
The algorithm used in our code to identify the distribution of
likelihoods is the Metropolis algorithm, \citep{metrop}. The
likelihood function $L$ is a function of the potential and of the
distribution function histograms, i.e., $L$ is a function in
$n$-dimensional space, where $n$ is the number of energy bins in the
DF histogram times the number of mass shells. Let us represent the
likelihood as $L = L({\bf x})$ where ${\bf x}\equiv(\rho,f)$.
The code samples the function $L({\bf x})$ through a series of
iterations $L({\bf x}_n)$. Two consecutive iterations are related by a
transition probability $p({\bf x}_n\rightarrow{\bf x}_{n+1})$. The
transition probabilities are chosen as
\begin{equation}
p({\bf x}\rightarrow{\bf x'}) =
{\rm max}\left( {L({\bf x'})\over L({\bf x})},1\right),
\end{equation}
although in principle any choice satisfying detailed balance
\begin{equation}
L({\bf x'})p({\bf x'}\rightarrow{\bf x}) =
L({\bf x})p({\bf x}\rightarrow{\bf x'})
\end{equation}
would serve.
\begin{figure}
\plotone{dcps.fig1.ps}
%\caption{Schematic diagram to illustrate the Metropolis algorithm.
%The code starts at ${\bf x}_0$, and proceeds to higher $L$ at ${\bf
%x}_1$. A trial ${\bf x}_2$ at lower $L$ is rejected, and so ${\bf
%x}_1$ gets repeated as ${\bf x}_2$. Next, the code goes to ${\bf
%x}_3$, also at lower $L$, and then on to ${\bf x}_4$. The code
%wanders around near the global maximum of $L$, the sampling being
%proportional to $L$.}
%\label{fig:metrofig}
\end{figure}
The algorithm is schematically represented in
Figure~\ref{fig:metrofig}. At each iteration, the code tries a new
${\bf x}$, i.e., new $\rho$ and $f$. These histograms are always
non-negative and monotonic, and with unit normalization for $f$, but
otherwise arbitrary. The trial ${\bf x}$ is accepted or rejected with
the transition probability; in case of rejection the code keeps the
old ${\bf x}$ for the next iteration.
In its early phase Metropolis searches out the region around the
maxima in the likelihood and then the more ``equilibrium'' phase sets
in, when Metropolis wanders about in the region of the
multi-dimensional energy-density space, where the likelihood is close
to maximal. The extent of this wandering provides the uncertainties
in the mass estimates.
As the iterations proceed, the likelihood is monitored. Initially,
there is trend towards increasing likelihood. When this trend
disappears, Metropolis is in the equilibrium phase. In practice we
waited till iteration $k$ such that $L({\bf x}_k)