This article strongly resonates with me (thanks OP!). Models trained on huge datasets are truly very impressive, and it's easy to jump on the hype train of "BIG MODEL GO VROOM" and overlook the cool / useful things one can do with a little data and solid domain knowledge.
Transfer learning can be immensely useful when applicable, but many times it's not (e.g. imagine a medical domain where you track a patient through some process, recording procedure choices, measurements and outcomes, etc., where it can be difficult to find relevant data elsewhere).
Some approaches I've found useful:
* Get to know the domain really well
* It's not a lot of data - that's a potential for rich interactive visualizations that allow you to get to know the data quite well, and grok how it relates to the domain knowledge
* Following the advice that ML models in production could/should start with simple heuristics, view your model more as an augmented heuristic than a powerful model to solve everything - that means also figuring out how to catch and handle cases where it's wrong (which is something one ought to do anyway)
* Invest in tailoring priors suitable to the problem, based on your domain knowledge and understanding of the data. This can range from writing your own loss function to training an ad-hoc type of model, not based on DL, e.g. using metaheuristics (genetic algorithms, simulated annealing etc.). The advantage of small data is that evaluation on it can be relatively fast and ad-hoc models using nonstandard techniques can be realistically optimized (sometimes, depending on context of course).