Skip to content
Matthew Adams By Matthew Adams Co-Founder
Guest Blog Post: Hello World! I'm Ray and I'm doing work experience.

Hello, I am Ray Poynton-Gillott, a Year 10 student and I am studying computer science as a GCSE. I'm spending a week of work experience at endjin. Endjin are a fully remote company which means everyone works from home so I spent the day with Matthew, working from his home office, and connecting with other people over Slack and Teams.

So far this week I have:

  • Sat in on the morning meetings where the team plans the work for the week; the team talked a little bit about the last weeks work and this week slides for what needed to be done, as well as the budget burndown, and contract statuses.
  • Paired with some of the more senior staff members to see what they have been doing, any problems they have and solutions to those problems.

Open Source Software

The first member I was paired with was Ian. With Ian I learnt about their open source project Corvus.JsonSchema. The fact that Corvus is Open Source means that anyone has access to the source code, and they are encouraged to contribute changes, issues, and suggestions. Ian showed me the process of updating the project and what needed to be done for an official update to be approved.

This process involved uploading the changes to github, having someone review and test it, having to fix any bugs, reupload it and repeat until it is approved. Then it is packaged and uploaded to the official page for Corvus.JsonSchema on nuget.org.

Data Pipelines

Next I worked with Jon. I got onto a call with Jon, in which he talked about using Visual Studio for doing Python coding and using Microsoft Synapse for a project he's working on right now. He showed and explained that the project had multiple stages within it for processing data in a pipeline. The first stage which was all about gathering data, no matter where it came from or how it was stored, as long as it had to do with the project.

Then in the second stage Jon showed me how he had written some code in Python using the library pyspark to transfer the data into more standardised storage schema. Then for the next stage Jon explained that the data would then be taken from its standardised schema into more specialised schema to then be used in specific projects.

Testing Data Pipelines

Jon also explained to me that he has some automated tests that run so that when the code gets changed in the future, if it breaks they will know where the code broke and fix it before it reaches production. Jon explained that earlier on in the project he didn't make too many tests as it meant he would have had to change them a lot as the early code changed, but it means now there aren't enough tests so he has to go through now and fix the code that didn't get spotted due to a lack of tests.

One problem with this was that the tests took ages due to the way pyspark works. It splits all the data and then re-joins it later on which means it works efficiently on big data sets, but it just takes ages for smaller test data sets.

One way that Jon got around this was replacing the names of storage types or where it came from etc. which are represented as strings, and using numbers instead. This significantly decreases total storage and also decreases test times.

Finally during the meeting I was told about the different types of testing, there is unit testing, integration testing and end to end testing. Unit testing is small scale, just testing a few lines of code whereas integration testing is chunks, hundreds of lines, and end to end testing is the entire project architecture as it is deployed in real life.

JSON Schema

I have also been working on a project of my own, learning about JSON Schema1. So, what is JSON Schema all about?

Azure Weekly is a summary of the week's top Microsoft Azure news from AI to Availability Zones. Keep on top of all the latest Azure developments!

Let's start by explaining JSON. I had not come across this before in my Computer Science GCSE. JSON or JavaScript Object Notation, is a simple (usually) text based data storage method, which means it is readable for humans and easier for them to write than a binary format. It only has a few data types, and this keeps JSON simple.

These types are, string for text, integer for whole numbers, boolean for true and false, number for any number, including integers, and decimals like "float" numbers in Python (which I am studying as part of my GCSE).

There is also object for describing values which consist of a set of keys and values (which can also be any of the JSON data types). Finally, there is array, which is a list of many values of any JSON data type.

Power BI Weekly is a collation of the week's top news and articles from the Power BI ecosystem, all presented to you in one, handy newsletter!

This simplicity makes it quite easy to learn even if you have no computer knowledge. I found it quite easy to convert to what I know about Python as most of the syntaxes are very similar. However, this simplicity means that you can write almost any data structure in JSON. But when we send some data from one place to another, we need to tell people how to interpret what we've sent. That's where JSON Schema comes in.

What is JSON Schema

JSON Schema is a standard way of describing JSON data, that is also written in JSON. It gives us a set of rules that tell us how to read or write the data so that there is less misunderstanding after data has been transferred, and also so that it is easier to test, so there are fewer errors.

It can also make things quicker and easier when you have different senders and receivers. You do not need to send over instructions on how to read the data with every request, and they don't have to use the same language to validate the data, as long as they both know how to read and apply JSON Schema.

Learning JSON schema

The JSON Schema organization publishes an online learning tool called Tour JSON Schema. It gives you a series of interactive lessons that explain different features of JSON Schema, taking you through all the important keywords and concepts. There's also an interactive editor that gives you some example JSON schema, and a puzzle to figure out how to complete the schema to achieve some goal related to the topic of the lesson. If you get the schema correct, the Validate button passes and you can go on to the next lesson.

It took me about 3 days to complete the tour from start to finish, and they will email me a certificate to say that I have done it.

Good things about the tour

The tour explains the differences between the features well, and what they do in a way that even someone whose never used a programming language or looked at data could understand. Another good thing about the tour is that it gives helpful examples of how the code should look like without telling the user how the code should be written. Another great thing is that it links the data types back to programming languages to give people who program a better understanding of what the data types do.

Bad things about the tour

One problem with the tour is that it expected the user to have a base knowledge of JSON. That isn't too problematic as its easy and quick to learn. Another thing that is a bit frustrating at the start is that the editor doesn't have any help for you when you don't know why the code isn't working, and it turns out it is just because you are missing a curly brace or a square bracket. This could be solved using something like a spell check for the code written, so when there is an error it says what is wrong so there is less trial and error.

End of the week

Overall this week has been a fun experience and has given me insight into what a job in the field of computer science is like. I have enjoyed learning JSON and JSON Schema, as its lessons were nicely structured without having me feel like it was a school lesson. Another great thing about the week is how inclusive the team have been, explaining what they've been working on in enough detail so I can understand what is happening.

And the most important lesson I've learned all week: "take notes on everything as you go along".


  1. Endjin are a sponsor of the JSON Schema organization, and one of endjin's founders, Matthew Adams, is a member of the JSON Schema Technical Steering Committee.

Matthew Adams

Co-Founder

Matthew Adams

Matthew was CTO of a venture-backed technology start-up in the UK & US for 10 years, and is now the co-founder of endjin, which provides technology strategy, experience and development services to its customers who are seeking to take advantage of Microsoft Azure and the Cloud.