AI/ML

Leverage what seems impossible and distant capabilities, to make it a reality and push the boundaries of our imagination in realizing what is possible within the technology field, with AI at its foundation.

"An old friend once told me something that gave me great comfort. Something he had read. He said that Mozart, Beethoven, and Chopin never died. They simply became music."

~ Dr. Robert Ford.

Projects

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Using TensorFlow's computer vision to recognize different items of clothing

overview

Computer Vision, often abbreviated as CV, is defined as a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos.[1]

Objective

Create and train a computer vision model to recognize different items of clothing using TensorFlow on Google Cloud Compute Engine.

The goal is for the model to figure out the relationship between the training data and its labels. Once training is complete, you want your model to see fresh images of clothing that resembles your training data and make predictions about what class of clothing they belong to.

Key Concepts

  • Tensorflow
  • Computer Vision
  • Convolutional Neural Networks
  • TF.Keras
  • Neural Netoworks
  • Activation Functions
  • Model Optimizers & Loss Functions

Data Sources, Model Design, Compile, Trainng & Validation

Data Sources

This involved training a neural network to classify images of clothing from a dataset called Fashion MNIST. This dataset contains 70,000 items of clothing belonging to 10 different categories of clothing.

60,000 images were used to train the network and 10,000 images were used to evaluate how accurately the network learned to classify images.

Model Design
model = tf.keras.models.Sequential ([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64,activation=tf.nn.relu),
tf.keras.layers.Dense(10,activation=tf.nn.softmax)])

Model Parameters
  • [Sequential] This defines a SEQUENCE of layers in the neural network.
  • [Flatten] Images are of shape (28, 28), i.e, the values are in the form of a square matrix. Flatten takes that square and turns it into a one-dimensional vector.
  • [Dense] Adds a layer of neurons.

Activation Functions
  • [Relu] effectively means if X>0 return X, else return 0. It passes values 0 or greater to the next layer in the network. When you increase to 128 neurons, you have to do more calculations. This slows down the training process.
  • [Softmax] takes a set of values, and effectively picks the biggest one so you don't have to sort to find the largest value. For example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it returns [0,0,0,0,1,0,0,0,0].
Model Compile

The goal is to figure out the relationship between the training data and its labels.

model.compile(optimizer = tf.keras.optimizers.Adam(),
loss = tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

Loss indicates the model's performance by a number. If the model is performing better, loss will be a smaller number. Otherwise loss will be a larger number.

Notice the metrics= parameter. This allows TensorFlow to report on the accuracy of the training after each epoch by checking the predicted results against the known answers(labels). It basically reports back on how effectively the training is progressing.

Optimizers

One of the two arguments required for compiling a tf.keras model. An Optimizer is an algorithm that modifies the attributes of the neural network like weights and learning rate. This helps in reducing the loss and improving accuracy.

The model compile parameters worked with the following parameters:

  • [SparseCategoricalCrossentropy():] Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss. There should be # classes floating point values per feature for y_pred and a single floating point value per feature for y_true.
  • [SparseCategoricalAccuracy()]):] This metric creates two local variables, total and count that are used to compute the frequency with which y_pred matches y_true. This frequency is ultimately returned as sparse categorical accuracy: an idempotent operation that simply divides total by count.
Model Validation

Initially the neural network was about 89% accurate in classifying the training data. It figured out a pattern match between the image and the labels that worked 89% of the time.

Derive insights from your images in the cloud or at the edge with AutoML Vision or use pre-trained Vision API models to detect emotion, understand text, and more.

Image detection and moderation using Sightengine APIs

Summary

How to solve for unwanted or inappropriate image display on a web or mobile platform using an Image moderation AI APIs called Sightengine.

Objective

Establish an automated way to prevent the upload of unwanted / adult images & video to a social network application.

How the problem was solved for:

Sightengine is an Artificial Intelligence company that uses proprietary state-of-the-art Deep Learning systems to provide powerful image and video analysis through simple and clean APIs. They provide different services in the field of image recognition like:

  • Face
  • Scam
  • Nudity
  • Celebrity
  • Minors
  • weapons
  • alcohol
  • drugs
  • image quality
  • offensive and hate signs and symbols
  • artificial text recognition

Implementation

Data Sources

Sightengine has brilliantly reduced the complexity of an AI detection system down to 2 lines of code. The first one initializes the engine, the second checks the image based on the parameter passed, in our case, we’ll focus on the nudity detection model.

The Nudity Detection Model determines if an image contains some level of nudity along with a description of the “level” of nudity.

These levels are based on the following criteria:

  1. Raw nudity (X-rated material such as genitals, bare breasts…)
  2. Partial nudity due to the presence of women in bikini
  3. Partial nudity due to the presence of bare-chested males. Partial nudity due to the presence of suggestive cleavages
  4. No nudity (safe content)
Probablity analysis and meaning

The probability score of each of these criteria is between 0 and 1, with 1 having the highest probability level while 0 having the lowest.

$sightEngine = new SightengineClient(env('SIGHTENGINEUSER'), env('SIGHTENGINEKEY'));
$imageCheck = $SightEngine->check(['nudity'])->set_file($localFilePath);

  • [raw] With a “raw” score of 0.01, this means that the image has the lowest probability of containing any nudity.
  • [safe] With a score close to 1, this “safe” score tells us that the image is safe. “partial” : 0.01
  • [a partial score of 0.01] means that there is the lowest probability of having implied/partial nudity on the image.

AutoML enables developers with limited machine learning expertise to train high-quality models specific to their business needs. Build your own custom machine learning model in minutes.

Classify images with AutoML using Vertex AI

Objective

Use an Image dataset to train an AutoML model that classifies flowers dataset into the labels assigned. Label options include:

Key Concepts

  • AutoML
  • Image Classification
  • Vertex AI
Data Sources

The image files used are from the flower dataset. These input images are stored in a public GCS bucket with a CSV file for data import. This file has two columns: the first column lists an image's URI in GCS, and the second column contains the image's label.

On the autoML console, “create dataset” and set the import file path to the CSV file.
gs://cloud-samples-data/ai-platform/flowers/flowers.csv.

Training an AutoML Image Classification Model

The AutoML training method allows the user to train the model with minimal effort and ML expertise. Node budget was set to 8 hours and training took several hours and a notification was sent after. Training completed in 30 minutes with an accuracy score of 0.98

Model Deployment

After the AutoML image classification model completed training, the next step was to create an endpoint and deploy the model to the endpoint.

Endpoints can be created from the evaluation tab on the training page. This was named “automl_image”. The Model settings accepted the traffic split of 100% and 1 node was deployed to serve the endpoint prediction.

After the model was deployed to this new endpoint, the next step was to send an image to the model for label prediction.

Model Prediction

After the endpoint creation process finished, we sent a single image annotation (prediction) request in the console. This was done by using the “Test your model” section and uploading a picture for prediction.

Classify text with AutoML using Vertex AI

Objective

This project demonstrates how to create a model for classifying content using Vertex AI. The project trains an AutoML model by using a corpus of crowd-sourced "happy moments" from the Kaggle open-source dataset HappyDB. The resulting model classifies happy moments into categories reflecting the causes of happiness.

Key Concepts

  • AutoML
  • Text Classification
  • Vertex AI
Data Sources

The text files used are from the HappyDB dataset with 24,528 rows. Copy the data from the ml GCS bucket into your own bucket.

gsutil -m cp -R gs://cloud-ml-data/NL-classification/happiness.csv gs://${BUCKET}/text/

This file has two columns: the first column lists the happiness text in GCS, and the second column contains the text label.

On the autoML console, “create dataset” named “auto_ml_text_classification” set the “Data type” and “objective” with the “text” tab and select the radio button “text classification - single-label” and set the import file path to the CSV file : gs://paulkamau-lcm/text/happiness.csv

Training an AutoML Text Classification Model

The AutoML training method allows the user to train the model with minimal effort and ML expertise. Node budget was set to 8 hours and training took several hours and a notification was sent after.


Model Deployment

After the AutoML text classification model completed training, the next step was to create an endpoint and deploy the model to the endpoint.

Endpoints can be created from the evaluation tab on the training page. This was named “automl_text”.

Model Prediction

After the endpoint creation process finished, we sent a single image annotation (prediction) request in the console. This was done by using the “Test your model” section and uploading a picture for prediction.

Classify Video with AutoML using Vertex AI

Objective

This project demonstrates how to create a model for classifying content using Vertex AI. The project trains an AutoML model by using a set of videos on GCS.

Key Concepts

  • AutoML
  • Video Classification
  • Vertex AI
Data Sources

The text files used are from the HappyDB dataset with 24,528 rows. Copy the data from the ml GCS bucket into your own bucket.

gsutil -m cp -R gs://automl-video-demo-data/hmdb_split1_5classes_all.csv gs://auto-ml-tutorials/video/

This file has two columns: the first column lists the video source GCS, and the second column contains the video label.

On the autoML console, “create dataset” named “auto_ml_video_classification” set the “Data type” and “objective” with the “video” tab and select the radio button “video classification” and set the import file path to the CSV file : gs://auto-ml-tutorials/video/hmdb_split1_5classes_all.csv

Evaluation

After the AutoML video classification model completed training, the next step was to create an endpoint and deploy the model to the endpoint. The model average Precision & Recall were 100%. The Confusion matrix true label vs predicted label was 100%.

Endpoints can be created from the evaluation tab on the training page. This was named “automl_video”.


Batch Prediction

Model predictions will be done in batch format. The batch name, “demo_data_predictions” , source path & destination was set. The prediction results are stored on the GCS bucket.

Viewing the results.

In the results for the video annotation, Vertex AI provides three types of information:

  • Video labels are under the Segment tab, Shot Labels within the video are under the Shot tab.
  • The 1 Second Interval contains the second by second labeling.
  • Changing the threshold allowed me to see more labels. AutoML video only displays the labels that are above the specified threshold. Failed predictions are on the Recent Predictions list.

Classify tabular data with AutoML using Vertex AI

Objective

Build a binary classification model from tabular data using Vertex AI.

The goal of the trained model is to predict whether a bank client will buy a term deposit (a type of investment) using features like age, income, and profession. This type of model can help banks determine who to focus its marketing resources on.

Key Concepts

  • AutoML
  • Linear Regression
  • Vertex AI
Data Sources

This tutorial uses the Bank marketing open-source dataset.

On the autoML console, “create dataset” and set the import file path to the CSV file.
gs://cloud-samples-data/ai-platform/flowers/flowers.csv.

Training an AutoML Linear Regression Model

The AutoML training method allows the user to train the model with minimal effort and ML expertise. Node budget was set to 8 hours and training took several hours and a notification was sent after.

Model Deployment

After the AutoML image classification model completed training, the next step was to create an endpoint and deploy the model to the endpoint.

Endpoints can be created from the evaluation tab on the training page. This was named “automl_image”. The Model settings accepted the traffic split of 100% and 1 node was deployed to serve the endpoint prediction.

After the model was deployed to this new endpoint, the next step was to send an image to the model for label prediction.

Model Prediction

After the endpoint creation process finished, we sent a single image annotation (prediction) request in the console. This was done by using the “Test your model” section and uploading a picture for prediction.

Explainable AI is a set of tools and frameworks to help you understand and interpret predictions made by your machine learning models, natively integrated with a number of Google's products and services

BigQuery ML lets you create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.

Taxi Fare prediction with a BigQuery ML Forecasting Model

Objective

create a machine learning model inside of BigQuery to predict the fare of the cab ride given your model inputs and evaluate the performance of your model and make predictions with it.

Fraud Detection on Financial Transactions with Machine Learning on Google Cloud

Objective

Explore the financial transactions data for fraud analysis, apply feature engineering and machine learning techniques to detect fraudulent activities using BigQuery ML.

Soccer Match Outcomes Predict with BigQuery ML

Objective

Use BigQuery to load the data from the Cloud Storage bucket, write and execute queries in BigQuery, analyze soccer event data. Then use BigQuery ML to train an expected goals model on the soccer event data and evaluate the impressiveness of World Cup goals.

Publications

Using Laravel and Sightengine’s Artificial Intelligence as a Service to detect and filter inappropriate images

Artificial Intelligence as a service, or AIaaS, is the on-demand delivery and use of AI and Deep learning capabilities towards an individual or business objective.

An introduction to Sensory ML with Google’s Pre-built AI

Getting started with AI made for human vision, speech, text, phonics, auditory & neural pathways..

Copyright © 2022. All Rights Reserved