Differentiable Programming – A Simple Introduction
159 points • 49 comments
From 9/9/2014, 5:46:44 PM till now, @dylanbfox has achieved 395 Karma Points with the contribution count of 97.
Recent @dylanbfox Activity
Differentiable Programming – A Simple Introduction
159 points • 49 comments
Is hybrid work the worst of both worlds?
3 points • 0 comments
Is light pollution a problem? [video]
1 points • 0 comments
Looks interesting! Do you guys offer any sort of visibility tools/reports into customer usage of different endpoints, tracking of actual API requests (including payloads/etc) per request, etc? Is it possible to use you guys just for billing if we already have our own auth?
Agree. The places I go on the web have become more and more centralized/limited. I think projects like this that help to surface and aggregate interesting content from the web (which is really what I come to HN to find) are great.
Space pictures from the Voyager 1 and 2 missions
1 points • 0 comments
Nice work! The UI is really simple - and love not having to log in to use it. Have you thought about leveraging the ListenNotes API (https://www.listennotes.com/api/) to automatically pull in the podcast episodes via search vs having to upload them?
I agree with all your points - but one thing I think about is: how do we fix what we have today? How do you fix the concrete jungles that most cities are today in the US. Or is it inevitable that more concrete will just be poured over time until some major natural disaster allows for a reset?
Text Segmentation – Approaches, Datasets, and Evaluation Metrics
5 points • 3 comments
An Overview of Transducer Models for ASR
8 points • 0 comments
Hi there - OP here - thanks for reading!
This blog is more of an intro to a few high level concepts (multi-GPU and multi-node training, fp32 vs fp16, buying hardware and dedicated machines vs AWS/GCP, etc) for startups that are early into their deep learning journey, and that might need a nudge in the right direction.
If you're looking for a deep dive into the best GPUs to buy (cost/perf, etc), the link in the below comment gives a pretty good overview.
PS - I can send you some benchmarks we did that show (at least for us) Horovod is ~10% faster than DDP for multi-node training FWIW. Email is in my profile!
Author here. Thanks for your comments!
In general - this is expensive stuff. Training big, accurate models just requires a lot of compute, and there is a "barrier to entry" wrt costs, even if you're able to get those costs down. I think it's similar to startups not really being able to get into the aerospace industry unless they raise lots of funding (ie, Boom Supersonic).
Practically speaking though, for startups without funding, or access to cloud credits, my advice would be to just train the best model you can, with the compute resources you have available. Try to close your first customer with an "MVP" model. Even if your model is not good enough for most customers - you can close one, get some incremental revenue, and keep iterating.
When we first started (2017), I trained models that were ~1/10 the size of our current models on a few K80s in AWS. These models were much worse compared to our models today, but they helped us make incremental progress to get to where we are now.
Dylan from Assembly here. If you want to send me one of your audio files (my email is in my profile) I'd be happy to send you back the diarized results from our API.
You can also signup for a free account and test from the dashboard without having to write any code if that's easier.
Other than lots of crosstalk in your group conversations - is there anything else challenging about your audio (eg, distance from microphones, background noise, etc?)
Great question. This is technically referred to as "Wake Word Detection". You run a really small model locally that is just processing 500ms (for example) of audio at a time through a light weight CNN or RNN. The idea here is that it's just binary classification (vs actual speech recognition).
There are some open source libraries that make this relatively easy:
- https://github.com/Kitt-AI/snowboy (looks to be shutdown now) - https://github.com/cmusphinx/pocketsphinx
This avoids having to stream audio 24x7 to a cloud model which would be super expensive. This being said, I'm pretty sure what the Alexa does, for example, is send any positive wake word to a cloud model (that is bigger and more accurate) to verify the prediction of the local wake word detection model AFAIK.
Once you are positive you have a positive wake word detected - that's when you start streaming to an accurate cloud based transcription model like Assembly to minimize costs!
Interesting. How do you guys manage spot interruptions when training on spot instances?
This is tricky. The de facto metric to evaluate an ASR model is Word Error Rate (WER). But results can vary widely depending on the pre-processing that's done (or not done) to transcription text before calculating a WER.
For example if you take the WER of "I live in New York" and "i live in new york" the WER would be 60% because you're comparing a capitalized version vs an uncapitalized version.
This is why public WER results vary so widely.
We publish our own WER results and normalize the human and automatic transcription text as much as possible to get as close to "true" numbers as possible. But in reality, we see a lot of people comparing ASR services simply by doing diffs of transcripts.
> Salary costs are probably even higher than compute costs.
Yes exactly. Managing that much compute requires many humans!
Dylan from Assembly here. Most of our customers have actually switched over to us from Google - this Launch HN from a YC startup that uses our API goes into a bit more detail if you're interested:
https://news.ycombinator.com/item?id=26251322
My email is in my profile if you want to reach out to chat more!
How to train large deep learning models as a startup
273 points • 81 comments
Interesting. It seems like in the "real world" WER is not really the metric that matters, it's more about "is this ASR system performing well to solve my use case" - which is better measured through task-specific metrics like the one you outlined your paper.
site design / logo © 2022 Box Piper