I’ve linked to the individual videos below, along with my notes, but in summary:
- Machine Learning for Computational Biology and Health is a wide-raging tour of ML in those domains.
- From System 1 Deep Learning to System 2 Deep Learning outlines a road-map to deep-learning addressing symbolic (conscious) reasoning.
- Representation Learning and Fairness, a modular framework for algorithmic fairness.
Machine Learning for Computational Biology and Health (Anna Goldenberg, Barbara Engelhardt)
A two-hour firehose tutorial on the application of ML in biology (research) and clinical settings.
It’s great to get a run-down on the challenges (large feature sets, small sample size; missing data is not random, to name two) and a tour of applications. And it’s an extensive tour: genetics, epigenetics, transcriptomics, proteomics, microbiome (briefly).
The second half focuses more on clinic data: patient modelling, predicting (for example) heart attack, problems and lessons from working with clinicians). This quote was great, as a reminder that nicely cleaned-up data isn’t going to cut it in the clinic:
“This is actually a must. If you’re working in a clinical context, irregularly sampled data is the only kind of data you will get. Irregular sampled with messiness. There is no way around that.” (1:27:25)
Papers and need to chase down:
- Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images
- Why most discovered true associations are inflated
- ChromHMM: automating chromatin state discovery and characterization
- SAIL: Symposium on Artificial Intelligence for Learning Health Systems
- ACM CHIL: ACM Conference on Health, Inference, and Learning
- ML4HC: Machine Learning for Healthcare
- ProbGen: Probabilistic Modeling in Genomics Conference
- Fair ML for health
Watch the recording.
From System 1 Deep Learning to System 2 Deep Learning (Yoshua Bengio)
It’s been a long time since I heard anyone talking about how symbolic processing fits into neural-style computation. There’s been a lot going on in this area, but the last time I touched on this seriously was back in 1990s with Pollack’s Recursive Distributed Representations work.
The talk is outlining a series of challenges, and a way through them, for deep learning to move to system 2 cognition. In particular, it’s not a matter of getting more data and building bigger models. Instead, it’s looking at variables and trying to capture higher-level causation (beyond what you can find in pixel-level data).
Interestingly, Bengio isn’t looking a bolting symbolic systems on-top of networks. He wants to implement symbolic processing in a neural architecture.
Papers for me to follow-up on:
- Bengio (2017) The Consciousness Prior
- Bengio, Bengio & Cloutier (1991) Learning a synaptic learning rule
- Ortega et al. (2019) Meta-learning of Sequential Strategies
- Bengio et al. (2019) A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Here’s the recording.
Representation Learning and Fairness (Sanmi Koyejo)
The focus here is on separating fairness from the machine learning model. The fairness parts (data regulator, data producer) are before the model, so that the model can ignore fairness. It’s a fascinating idea explored in this two hour tutorial.
It is not magic: you have to define fairness as a “data regular”. One example given was individual fairness, where similar inputs (according to some metric you define) should produce similar outputs.
With a way to measure fairness, the “data producer” can then transform data sets to learn a “fair” representation. This is the point where the “representation learning” of the title comes into things. Representation learning is one way to compute a data summary. That means going from a high dimensional representation to a lower one (like PCA). What’s supposed to happen is points that should be treated similarly are moved to be closer together in the representation space. Learning this transformation is the job of the “data producer”.
The “data user” learns a model to perform some task (credit rating, for example) using the sanitised data from the data producer. You’ll want to audit the data user (e.g., the predictions made) to check they are fair as defined by the data regulator.
It appears the trick here is trading-off fairness and performance.
- McNamara, Ong and Williamson (2019) Costs and Benefits of Fair Representation Learning PDF
- Hiranandani et al. (2019) Multiclass Performance Metric Elicitation
- Weinberger and Saul (2009) Distance Metric Learning for Large Margin Nearest Neighbor Classification
- Zemel et al. (2013) Learning Fair Representations PDF
- Louizos et al. (2016) The Variational Fair Autoencoder
- Song et al. (2018) Learning Controllable Fair Representations
- Creager et al. (2019) Flexibly Fair Representation Learning by Disentanglement
- Locatello et al. (2019) On the Fairness of Disentangled Representations
This is an entire area I didn’t realise was so far advanced. Go watch the tutorial if you’re interested. It’s really well done.
There’s much more
There are over 250 presentations online already. Topics cover technical aspects of ML, but there are also many domain specific talks: climate change, creativity and design, and what seems to me as a lot of health and biology content.