Skip to content

Interest in Data Science & Machine Learning has sky-rocketed as businesses realize that they need deeper and more valuable insights into their data. But how do you manage an essentially open-ended process where you may never get the 'right' answer?

Endjin have developed a pragmatic approach to data science, based on a series of iterative experiments, relying on evidence-based decision making to answer the most important business questions.

Following this process allows us to iterate to insights quickly when there are no guarantees of success.

1. Understand the business objective

We'll help you clarify what you're doing and why. The key is aligning data science work to business needs. Data exploration, preparation and model development should all answer a defined business question, tied to an overall strategy or business goal.

2. Start with a hypothesis

Once the objective is clear, we'll define a testable hypothesis with parameters and success criteria upfront. Defining what success looks like before starting the experiment keeps decisions aligned with business goals.

3. Time-box an experiment

Data science is an open-ended process where you may never get the 'right' answer. We'll agree how much time and effort to spend proving or disproving the hypothesis, working in weekly iterations.

4. Prepare the data

Choosing and preparing the data is the most critical step. The more data you have and the better its quality, the better the results.

Preparation means handling missing, duplicate and outlying values, combining internal and external sources, and using ETL processing alongside statistical analysis to identify the most useful data attributes.

5. Experiment with algorithms

With prepared data, we apply appropriate algorithms to find a model that answers your questions. The best algorithm depends on the size, quality and nature of your data, the type of question you're asking, and what you want to do with the answer. We also build in mechanisms for re-training the model so it stays relevant as your data evolves.

6. Evaluate and iterate

We document the process as lab notes and present an executive summary of results and recommendations, giving you the evidence to decide whether to continue, stop, or change direction.

7. Productionize

If you find a successful model, we help you realise the value across the organisation by developing flexible, scalable data processing pipelines to power your intelligent solutions.