Walkthrough 1: Learning NGC Generative Models¶
In this demonstration, we will learn how to use ngc-learn’s Model Museum to fit an NGC generative model, which is also called a generative neural coding network (GNCN), to the MNIST database. Specifically, we will focus on training three key models, each from different points in history, and estimate their marginal log likelihoods. Along the way, we will see how to fit a prior to our models and examine how a simple configuration file can be set up to allow for easy recording of experimental settings.
Concretely, after going through this demonstration, you will:
Understand how to import and train models from ngc-learn’s Model Museum.
Fit a density estimator to an NGC model’s latent space to create a prior.
Estimate the marginal log likelihood of three GNCNs using the prior-fitting scheme designed in this demonstration.
Note that the two folders of interest to this demonstration are:
walkthroughs/demo1/
: this contains the necessary scripts and configuration fileswalkthroughs/data/
: this contains a zipped copy of the MNIST database arrays
Setting Up and Training a Generative System¶
To start, navigate to the walkthroughs/
directory to access the example/demonstration
code and further enter the walkthroughs/data/
sub-folder. Unzip the file
mnist.zip
to create one more sub-folder that contains a set of numpy arrays each house
a different slice of the MNIST database, i.e., trainX.npy
and trainY.npy
compose
the training set (image patterns and their labels), validX.npy
and validY.npy
make
up the development/validation set, and testX.npy
and testY.npy
compose the test set.
Note that pixels in all image vectors have been normalized for you, to the range of [0,1].
Next, in walkthroughs/demo1/
, observe the provided script sim_train.py
, which contains the
code to execute the training process of an NGC model. Inside this file, we can export
one of three possible GNCNs from ngc-learn’s Model Museum, i.e., the
GNCN-t1 (which
is an instantiation of the model proposed in Rao & Ballard, 1999 [1]),
the GNCN-t1-Sigma (an instantiation of the model proposed in Friston
2008 [2]), and the GNCN-PDH (one of the models proposed in
Ororbia & Kifer 2022 [3]).
Importing models from the Model Museum is straightforward and only requires a few lines to be placed in the header of a training script. Notice that we import several items besides the models, including a DataLoader, like so:
from ngclearn.utils.data_utils import DataLoader
some metrics, transformations, and other I/O tools, as follows:
from ngclearn.utils.config import Config
import ngclearn.utils.transform_utils as transform
import ngclearn.utils.metric_utils as metric
import ngclearn.utils.io_utils as io_tools
where Config is an argument configuration object that
reads in values set by the user in a *.cfg
configuration file, transform_utils
contains mathematical functions to alter vectors/matrices (we will use the binarize()
function, which, inside of sim_train.py
, will convert the MNIST image patterns
to their binary equivalents), and metric_utils
contains measurement
functions (we will use the binary cross entropy routine bce()
).
Finally, we import the models themselves, as shown below:
from ngclearn.museum.gncn_t1 import GNCN_t1
from ngclearn.museum.gncn_t1_sigma import GNCN_t1_Sigma
from ngclearn.museum.gncn_pdh import GNCN_PDH
With the above imported from ngc-learn, we have everything we need to craft a full training cycle as well as track a model’s out-of-sample inference ability on validation data.
Notice in the script, at the start of our with-statement (which is used to force the following computations to reside in a particular GPU/CPU), before initializing a chosen model, we define a second special function to track another important quantity special to NGC models – the total discrepancy (ToD) – as follows:
def calc_ToD(agent):
"""Measures the total discrepancy (ToD) of a given NGC model"""
ToD = 0.0
L2 = agent.ngc_model.extract(node_name="e2", node_var_name="L")
L1 = agent.ngc_model.extract(node_name="e1", node_var_name="L")
L0 = agent.ngc_model.extract(node_name="e0", node_var_name="L")
ToD = -(L0 + L1 + L2)
return float(ToD)
This function is used to measure the internal disorder, or approximate free energy,
within an NGC model based on its error neurons (since, internally, our imported models
use the specialized ENode to create error neuron
nodes, we retrieve each node’s specialized compartment known as the scalar local
loss L
– for details on nodes and their compartments, see
Demonstration # 2 for details – but you
could also compute each local loss with distance functions, e.g.,
L2 = tf.norm( agent.ngc_model.extract(node_name="e2", node_var_name="phi(z)"), ord=2 )
).
Measuring ToD allows us to monitor the entire NGC system’s optimization
process and make sure it is behaving correctly, making progress towards reaching
a stable fixed-point.
Next, we write an evaluation function that leverages a DataLoader
and a NGC model
and returns some useful problem-specific measurements. In this demo’s case,
we want to measure and track binary cross entropy across training iterations/epochs.
The evaluation loop can be written like so:
def eval_model(agent, dataset, calc_ToD, verbose=False):
"""
Evaluates performance of agent on this fixed-point data sample
"""
ToD = 0.0 # total disrepancy over entire data pool
Lx = 0.0 # metric/loss over entire data pool
N = 0.0 # number samples seen so far
for batch in dataset:
x_name, x = batch[0]
N += x.shape[0]
x_hat = agent.settle(x) # conduct iterative inference
# update tracked fixed-point losses
Lx = tf.reduce_sum( metric.bce(x_hat, x) ) + Lx
ToD = calc_ToD(agent) + ToD # calc ToD
agent.clear()
if verbose == True:
print("\r ToD {0} Lx {1} over {2} samples...".format((ToD/(N * 1.0)), (Lx/(N * 1.0)), N),end="")
if verbose == True:
print()
Lx = Lx / N
ToD = ToD / N
return ToD, Lx
Notice that, in the above code snippet, we pass in the current NGC model (agent
),
the DataLoader (dataset
), and the ToD function we wrote earlier.
Now we have a means to measure some aspect of the generalization
ability of our NGC model, all that remains is to craft a training process loop
for NGC model. This loop could take the following form:
# create a training loop
ToD, Lx = eval_model(agent, train_set, calc_ToD, verbose=True)
vToD, vLx = eval_model(agent, dev_set, calc_ToD, verbose=True)
print("{} | ToD = {} Lx = {} ; vToD = {} vLx = {}".format(-1, ToD, Lx, vToD, vLx))
sim_start_time = time.time()
########################################################################
for i in range(num_iter): # for each training iteration/epoch
ToD = 0.0 # estimated ToD over an epoch (or whole dataset)
Lx = 0.0 # estimated total loss over epoch (or whole dataset)
n_s = 0
# run single epoch/pass/iteration through dataset
####################################################################
for batch in train_set:
n_s += batch[0][1].shape[0] # track num samples seen so far
x_name, x = batch[0]
x_hat = agent.settle(x) # conduct iterative inference
ToD_t = calc_ToD(agent) # calc ToD
Lx = tf.reduce_sum( metric.bce(x_hat, x) ) + Lx
# update synaptic parameters given current model internal state
delta = agent.calc_updates()
opt.apply_gradients(zip(delta, agent.ngc_model.theta))
agent.ngc_model.apply_constraints()
agent.clear()
ToD = ToD_t + ToD
print("\r train.ToD {0} Lx {1} with {2} samples seen...".format(
(ToD/(n_s * 1.0)), (Lx/(n_s * 1.0)), n_s),
end=""
)
####################################################################
print()
ToD = ToD / (n_s * 1.0)
Lx = Lx / (n_s * 1.0)
# evaluate generalization ability on dev set
vToD, vLx = eval_model(agent, dev_set, calc_ToD)
print("-------------------------------------------------")
print("{} | ToD = {} Lx = {} ; vToD = {} vLx = {}".format(
i, ToD, Lx, vToD, vLx)
)
The above code block represents the core training process loop but you will also
find in sim_train.py
a few other mechanisms that allow for:
model saving/check-pointing,
an early-stopping mechanism based on patience, and
some metric/ToD tracking by storing and saving updated sets of scalar lists to disk. Taking all of the above together, you can simulate the NGC training process after setting some chosen values in your
*.cfg
configuration file, which is read in near the beginning of your training script. For example, intrain_sim.py
, you will see some basic code that reads in an external experimental configuration*.cfg
file:
options, remainder = getopt.getopt(sys.argv[1:], '', ["config=","gpu_id=","n_trials="])
# GPU arguments and configuration
cfg_fname = None
use_gpu = False
n_trials = 1
gpu_id = -1
for opt, arg in options:
if opt in ("--config"):
cfg_fname = arg.strip()
elif opt in ("--gpu_id"):
gpu_id = int(arg.strip())
use_gpu = True
elif opt in ("--n_trials"):
n_trials = int(arg.strip())
mid = gpu_id
if mid >= 0:
print(" > Using GPU ID {0}".format(mid))
os.environ["CUDA_VISIBLE_DEVICES"]="{0}".format(mid)
#gpu_tag = '/GPU:0'
gpu_tag = '/GPU:0'
else:
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
gpu_tag = '/CPU:0'
save_marker = 1
# load in and build the configuration object
args = Config(cfg_fname) # contains arguments for the simulation
Furthermore, notice that the above code snippet contains a bit of setup to allow you
to switch to a GPU of your choice (if you set gpu_id
to a value >= 0
)
or a CPU if no GPU is available (gpu_id
should be set to -1
). Notice that the
Config
object reads in a /path/to/file_name.cfg
text file and produces a
queryable object that is backed by a dictionary/hash-table.
The above code blocks/snippets can be found in train_sim.py
which has been written
for you to study and use/run along with the provided example configuration scripts
(notice for each model there one subfolder in /walkthroughs/demo1/
for each of the
three models for you to train, each with their own training script fit.cfg
and
analysis.cfg
script).
Let us go ahead and train one of each the three models that we imported into our
training script at the start of this demonstration.
Run the following three commands as shown below:
$ python sim_train.py --config==gncn_t1/fit.cfg --gpu_id=0 --n_trials=1
$ python sim_train.py --config==gncn_t1_sigma/fit.cfg --gpu_id=0 --n_trials=1
$ python sim_train.py --config==gncn_pdh/fit.cfg --gpu_id=0 --n_trials=1
Alternatively, you can also just run the global bash script exec_experiments.sh
that we have also provided in /walkthroughs/demo1/
, which will just simply execute the above
three experiments sequentially for you. Run this bash script like so:
$ ./exec_experiments.sh
As an individual script runs, you will see printed to the terminal, after each epoch,
an estimate of the ToD and the BCE over the full training sample as well as the
measured validation ToD and BCE over the full development data subset. Note that our
training script retrieves from the walkthroughs/data/mnist/
folder (that you unzipped earlier)
only the training arrays, i.e., trainX.npy
and trainY.npy
, and the
validation set arrays, i.e., validX.npy
and validY.npy
. We will use the
test set arrays testX.npy
and testY.npy
in a follow-up analysis once we have
trained each of our models above. After all the processes/scripts terminate, you can
check inside each of the model folders, i.e., walkthroughs/demo1/gncn_t1/
, walkthroughs/demo1/gncn_t1_sigma/
,
and walkthroughs/demo1/gncn_pdh/
, and see that your script(s) saved/serialized to disk a
few useful files:
Lx0.npy
: the BCE training loss for the training set over epochToD0.npy
: the ToD measurement for the training set over epochvLx0.npy
: the validation BCE loss over epochvToD0.npy
: the validation ToD measurement loss over epochmodel0.ngc
: your saved/serialized NGC model (with best validation performance)
You can use then plot the numpy arrays using matplotlib
or your favorite
visualization library/package to create curves for each measurement over epoch.
The final object, the model object model0.ngc
, is what we will use in the next section to
quantitatively evaluate how well our NGC models work as a generative models.
Analyzing a Trained Generative Model¶
Now that you have trained three NGC generative models, we now want to analyze them a bit further, beyond just the total discrepancy and binary cross entropy (the latter of which just tells you how good the model is at auto-associative reconstruction of samples of binary-valued data).
In particular, we are interested in measuring the marginal log likelihood of
our models, or log p(x)
. In general, calculating such a quantity exactly is
intractable for any reasonably-sized model since we would have marginalize out
the latent variables Z = {z1, z2, z3}
of each our generative models, requiring
us to evaluate an integral over a continuous space. However, though things seem
bleak, we can approximate this marginal by resorting to a Monte Carlo estimate and
simply draw as many samples as we need (or can computationally handle) from the
underlying prior distribution inherent to our NGC model in order to calculate log p(x)
.
Since the NGC models that we have designed/trained in this demo embody an underlying
directed generative model, i.e., p(x,Z) = p(x|z1) p(z1|z2) p(z2|z3) p(z3)
, we
can use efficient ancestral sampling to produce fantasy image samples after we
query the underlying latent prior p(z3)
.
Unfortunately, unlike models such as the variational autoencoder (VAE), we do not
have an explicitly defined prior distribution, such as a standard multivariate Gaussian,
that makes later sampling simple (the VAE is trained with an encoder that
forces the generative model to stick as close as it can to this imposed prior).
An NGC model’s latent prior is, in contrast to the VAE, multimodal and thus
simply using a standard Gaussian will not quite produce the fantasized samples
we would expect.
Nevertheless, we can, in fact, far more accurately capture an NGC model’s prior p(z3)
by treating it as a mixture of Gaussians and instead estimate its multimodal density
with a Gaussian mixture model (GMM). Once we have this learned GMM prior, we can
sample from this model of p(z3)
and run these samples through the NGC graph
via ancestral sampling (using the prebuilt ancestral projection function project()
).
ngc-learn is designed to offer some basic support for density estimation, and
for this demonstration, we will import and use its GMM density estimator (which
builds on top of scikit-learn’s GMM base model), i.e., ngclearn.density.gmm
.
First, we will need to extract the latent variables from the trained NGC model,
which simply requires us to adapt our eval_model()
function in our training script
to also now return a design matrix where each row contains one latent code vector produced
by our model per data point in a sample pool.
Specifically, all we need to write is the following:
def extract_latents(agent, dataset, calc_ToD, verbose=False):
"""
Extracts latent activities of an agent on a fixed-point data sample
"""
latents = None
ToD = 0.0
Lx = 0.0
N = 0.0
for batch in dataset:
x_name, x = batch[0]
N += x.shape[0]
x_hat = agent.settle(x) # conduct iterative inference
lats = agent.ngc_model.extract(node_name, cmpt_name)
if latents is not None:
latents = tf.concat([latents,lats],axis=0)
else:
latents = lats
ToD_t = calc_ToD(agent) # calc ToD
# update tracked fixed-point losses
Lx = tf.reduce_sum( metric.bce(x_hat, x) ) + Lx
ToD = calc_ToD(agent) + ToD
agent.clear()
print("\r ToD {0} Lx {1} over {2} samples...".format((ToD/(N * 1.0)), (Lx/(N * 1.0)), N),end="")
print()
Lx = Lx / N
ToD = ToD / N
return latents, ToD, Lx
Notice we still keep our measurement of the ToD and BCE just as an extra
sanity check to make sure that any model we de-serialize from disk yields values
similar to what we measured during our training process.
Armed with the extraction function above, we can gather the latent codes of
our NGC model. Notice that in the provided walkthroughs/demo1/extract_latents.py
script,
you will find the above function fully integrated and used.
Go ahead and run the extraction script for the first of your three models:
$ python extract_latents.py --config==gncn_t1/analyze.cfg --gpu_id=0
and you will now find inside the folder walkthroughs/demo1/gncn_t1/
a new numpy array
file z3_0.npy
, which contains all of the latent variables for the top-most layer
of your GNCN-t1
model (you can examine the configuration file analyze.cfg
to see
what arguments we set to achieve this).
Now it is time to fit the GMM prior. In the fit_gmm.py
script, we have set
up the necessary framework for you to do so (using 18,000
samples from the
training set to speed up calculations a bit). All you need to do at this point,
still using the analyze.cfg
configuration, is execute this script like so:
$ python fit_gmm.py --config==gncn_t1/analyze.cfg --gpu_id=0
and after your fitting process script terminates, you will see inside your model
directory walkthroughs/demo1/gncn_t1/
that you have a de-serialized learned
prior prior0.gmm
.
With this prior model prior0.gmm
and your previously trained NGC system
model0.ngc
, you are ready to finally estimate your marginal log likelihood log p(x)
.
The final script provided for you, i.e., walkthroughs/demo1/eval_logpx.py
, will
do this for you. It simply takes your full system – the prior and the model – and
calculates a Monte Carlo estimate of its log likelihood using the test set.
Run this script as follows:
$ python eval_logpx.py --config==gncn_t1/analyze.cfg --gpu_id=0
and after it completes (this step can take a bit more time than the other steps,
since we are computing our estimate over quite a few samples), in addition to
outputting to I/O the calculated log p(x)
, you will see two more items in your
model folder walkthroughs/demo1/gncn_t1/
:
logpx_results.txt
: the recorded marginal log likelihoodsamples.png
: some visual samples stored in an image array for you to view/assess
If you cat
the first item, you should something similar to the following (which
should be the same as what was printed to I/O when you evaluation script finished):
$ cat gncn_t1/logpx_results.txt
Likelihood Test:
log[p(x)] = -103.63043212890625
and if you open and view the image samples, you should see something similar to:
Now go ahead and re-run the same steps above but for your other two models, using
the final provided configuration scripts, i.e., gncn_t1_sigma/analyze.cfg
and
gncn_pdh/analyze.cfg
. You should see similar outputs as below.
For the GNCN-t1-Sigma, you get a log likelihood of:
$ cat gncn_t1_sigma/logpx_results.txt
Likelihood Test:
log[p(x)] = -99.73319244384766
with images as follows:
For the GNCN-PDH, you get a log likelihood of:
$ cat gncn_t1_sigma/logpx_results.txt
Likelihood Test:
log[p(x)] = -97.39334106445312
with images as follows:
For the three models above, we get log likelihood measurements that are desirably within the right ballpack of those reported in related literature [3].
You have now successfully trained three different, powerful NGC generative models using ngc-learn’s Model Museum and analyzed their log likelihoods. While these models work/perform well, there is certainly great potential for further improvement, extensions, and alternative ideas and it is our hope that future research developments will take advantage of the tools/framework provided by ngc-learn to produce even better results.
Note: Setting the --n_trials
flag to a number greater than one for the
above training scripts to run each simulation for multiple, different experimental
trials (each set of files/objects will be indexed by its trial number).
References¶
[1] Rao, Rajesh PN, and Dana H. Ballard. “Predictive coding in the visual cortex:
a functional interpretation of some extra-classical receptive-field effects.”
Nature Neuroscience 2.1 (1999): 79-87.
[2] Friston, Karl. “Hierarchical models in the brain.” PLoS Computational
Biology 4.11 (2008): e1000211.
[3] Ororbia, A., and Kifer, D. The neural coding framework for learning
generative models. Nature Communications 13, 2064 (2022).