On Tuesday this week Data Science SG organised a meetup at Google's premises in Singapore.
Background
Google offers APIs for Machine Learning (ML APIs), which are pre-trained AI algorithms. Out of the box, ML APIs allow you for instance to categorise a sentence from a news article or distinguish between a picture of a cat and that of a turtle.
On the other end of the spectrum Google provides TensorFlow which allows you to build your neural network model from scratch: this involves tinkering with layers, biases, functions and providing your own training data.
Half-way in between is AutoML where you can train the pre-trained ML APIs using something called transfer learning. With transfer learning the first layers of the neural network model are kept while the last layers are modified and re-trained with your own data.
The three speakers were Google developer advocates. What is a "developer advocate"? Well apparently it is someone who toys around with Google APIs mostly for fun and gives demos about it, which sounds like a pretty cool job to me.
Image recognition
Markku Lepisto (@markkulepisto) showed how he trained AutoML at identifying stamps on letters received by banks. Because he didn't have the original scans of actual customer data, he wrote a small Python program to automatically generate training data by combining images of stamps with random pages from scanned books downloaded from Project Gutenberg.
Sara Robinson (@srobtweets) briefly described the Google Natural Language API, which is pre-trained at doing categorisation -for instance working out that a sentence from a news article talks about baseball, as well as very simple sentiment extraction -extracting entities and positive/negative sentiment from of a customer review for instance. The online tool also allows you to visualise the syntax of the sentence (grammatical breakdown).
Then Sara asked: how would you go about building a model using your own custom categories?
She showed how to do it using TensorFlow which comes with Python and C++ interfaces. Some of the APIs are low-level (construct the model layer by layer), others are wrappers such as Keras.
For training data, Google BigQuery has large datasets ready to be used such as a list of StackOverflow questions along with their tags (C#, SQL, Javascript, ...).
The last part of the evening was left to Allen day (@allenday) in front of a hungry audience.
Allen decided to use BigQuery to analyse transaction data from the Bitcoin and Ethereum blockchains.
BigQuery synchronises with the Bitcoin ledger every 10min, which is good enough to have a near real-time view of the Bitcoin blockchain.
Allen used Gephi to develop interesting visualisations showing the movement of funds between wallets... More details here.
This a Google Easter Bunny |