Saturday, June 30, 2018

Beyond CAPTCHA

If you have ever filled an online form, you are bound to have encountered a CAPTCHA.

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. I haws been named after the computer scientist Alan Turing.

What is a Turing Test?
Turing test is used by humans to see if a machine can converse like a human. The turing test has evolved to an interesting concept. Today, its a scenario in which a human interrogator uses some sort of interface, often a computer terminal, to communicate with someone else. This some one else could be a another computer program or a human being.If a computer program manages to fool most of the human interrogators into believing that it's a human, the program is said to have the turing test.

The CAPTCHA test is a turing test designed as a short hand for a gateway that allows humans to pass through while keeping machines at the border. The idea of a CAPTCHA is  to build something which is simple for a human being to decipher while difficult for a machine to do so. This is important because it does not do any good if it discourages humans to fill out the said firm.
There are a few things a software can do better than humans, like driving a car or playing a game of chess under normal circumstances, but there are still things that humans can do better, at least that's how things stand today.

A simple CAPTCHA example the case where you have deformed letters and numbers semi-obscured by other shapes. While humans can easily differentiate these characters from the distractions, computer programs have a tough time segregating the signals from the noise. 

Evolutionary speaking, while AI keeps evolving all the time, humans have kind of sowed down in comparison. For ex: Researchers at the AI company Vicarious may have just made CAPTCHAs obsolete, however, by creating a machine-learning algorithm that mimics the human brain. They have created a artificial neural network that the firm claims can solve CAPTCHA puzzles by simulating the human capacity of "common sense". It works on a neural network that can suss out the shape of letters and numbers while while filtering out the distracting noise, by mimicking how a human brain works.
The artificial neural network attacks the problem in layers with each neuron attempting a part of the problem and then combining all solutions to come up with complete solution of the CAPTCHA. Solving CAPTCHAs is not the goal of the research, but it provides insight into how our brains work and how computers can replicate it.

According to security experts, we can expect to see hackers to make use of these technologies in a few months. So CAPTACHAs will become less effective in keeping the bots out.
As AI keeps getting better at  solving the CAPTCHAs, designers are coming up with more and more creative ways and these affect both humans and non-humans.
For example some sites have a small passage followed by a simple question about the content matter to test if the attempt was made by a human.

Tuesday, June 19, 2018

12-Factor Apps in Layman's terms

Popular platform-as-a-service provider Heroku has published a set of best practices called the Twelve-factor-app for building deployable sowtware-as-a-service apps that are

  • Portable
  • Continually Deployable and
  • Scalable
Here are the twelve principles in a more precise and understandable version:

CODEBASE: One codebase tracked by revision control, many deploys 

Put all your code in a version control system, the best option for me is the git.
You can have multiple branches but never multiple repositories for the same service/ microservice. There can be only one codebase per app, but there can be many deployments. 

DEPENDENCIES: Explicitly declare and isolate dependencies

This has been more important ever since configuration tools like Chef and Puppet have been used extensively used by DevOps to automate the configuration and deployment process.

Consider maven as dependency management tool, manifest will be pom.xml, which fetches dependencies as jar (artifacts) from various repositories.

All your dependencies, third party or otherwise must be declared and completely isolated across environments. Most of the modern programming languages have a built-in support for this. You can declare all libraries in your code, and during the deployment pass the environment variable as a part of the command line that picks up the correct file to deploy

CONFIG: Store config in the environment

Configuration are those part of the application that change based on the environment where the application is deployed on. For example database connection properties, application specific information such as host IP and port etc.

There should be a strict separation between config and code. Code should remain the same irrespective of where the application is being deployed, but configurations can vary.

BACKING SERVICESTreat backing services as attached resources

Backing services refer to the infrastructure and other services by which the application communicates over the network. Database, Message Brokers, other API-accessible consumer services such as Authorization Service, Twitter, GitHub etc., are loosely coupled with the application and treat them as resource.

The idea is to be able to swap any of these resources with a different provider without having to make any changes to the source code.

BUILD, RELEASE, RUN: Strictly separate Build and Run stages

The twelve-factor-app suggests strict separation of the build, release and run stages.
Build - converting source code into an executable bundle known as the build. (jar, war, ear etc.)
Release - Or Deployment. Getting the build and combining it with a configurations of the specific  environment, assign it a unique release number, and made ready to run.
Run - Execute the package on the specific environment.

The 'build'cycle does most of the heavy lifting and the run stage should be very light weight. Tools that help in achieving a full CI/CD pipeline are Jenkins, Thoughtworks go, and Codeship to name a few.

PROCESSES: Execute the app as one or more stateless processes

Modern applications are deployed on many nodes usually with a load balancer to direct the traffic to enable quick request handling. In such cases, we cant guarantee that consecutive  requests from the same client would go to the same node.  Hence its unwise, to rely on data stored in the previous requests, since it would not be available if the next request is directed to another node.
As a rule, you want each of those instances of running code to be stateless.

There are a number of ways to achieve this:

  • Save the state of the process in the database or shared storage.
  • Use scalable cache storage like Memcahe or Redis.
  • Package assets in executables (e.g. by using webjars at build time)


PORT BINDING: Export services via port binding

This is an extension of  the Backing Service factor.
Make sure  that your application is visible to others via port binding, so that other services can use it like a resource.

CONCURRENCY: Scale out via the process model

This is all about scalability. The idea with more smarter applications is to scale horizontally by deploying more copies of our application on multiple nodes rather than scaling vertically, i.e running a single instance on a much powerful system.

In particular, you’ll be able to do more stuff concurrently, by smoothly adding additional servers, or additional CPU/RAM and taking full advantage of it through the use of more of these small, independent processes.

DISPOSABILITY: Maximize robustness with fast startup and graceful shutdown`

Processes in twelve-factor-apps should start and stop in minimal time.

Fast startup is based on our idea of scalability and also gives way to the usage of microservices as opposed to monolithic applications. If an application takes 30 seconds to start up and handle requests, it defeats the idea of rapid releases.

Graceful shutdown also involves leaving the system in the correct state. All resources should be freed and all unfinished tasks should be returned to queue. Crashes also should be handled. Crashes, if they happen, the app should be start back up cleanly.

DEV/ PROD PARITYKeep development, staging, and production as similar as possible 

In the recent years, its become pertinent to have short and rapid production cycles. This means that the time window between the development of a change and deployment of the said change is become very small, sometimes a matter of hours. With this window being so small, you do not want to spend time on the deployment process, or carry the risk of something breaking in production.

Hence, its highly recommended to keep the developers environment as similar to the production as possible. This includes using the same third party(or otherwise) services, same configuration management tools, same softwares and libraries.

LOGS: Treat logs as event streams

Logs are highly useful. They can carry variety of information on many levels. They can come extremely handy while trouble shooting the errors in production. Read : Logging the right way

Other than trouble shooting customer issues, you can use error monitoring service like  Airbrake or Papertrail or data mining services like Logstash or Splunk.

ADMIN PROCESSESRun admin/management tasks as one-off processes

Your app might need some admin jobs at times, for example cleaning up the database for bad data, switching on/off configurations or features, generating reports for analytics or audit purposes.

The twelve factor app suggests that you do these one off tasks from an identical environment in production. Don''t directly access the database, do not access it from a terminal window.

SUMMARY:

The twelve principles mentioned above might not seem novel, you might already be using some already in your engineering cycles. But if you are running a microservice architecture, its important that you take these principles seriously. You might not see the benefit right away, but it will become highly important when you are running multiple services across multiple environments.

REFACTORING

 What is Refactoring? A software is built initially to serve a purpose, or address a need. But there is always a need for enhancement, fixin...