This is my first article in my upcoming series of AEP guides. Before we start – it’s all been said before. There’s already the Adobe documentation and an older article from Bounteous. My goal for this series is to help AEP concepts make way more sense to front-end technical marketers at the risk of leaving out a lot of the techy nuance. Nothing I’ve read really does it for me, so here we go. Let’s start with the foundation of AEP – the schema.
Table of Contents
What is a schema in AEP?
A schema is simply a blueprint for your data. You’re saying that a door needs to exist on a wall with a certain amount of clearance – and it only passes inspection if it’s built according to spec – otherwise it’s tossed. More literally, the schema says how the data should look when it gets to AEP. The example above shows the AEP interface, but an actual takes data that’s passed into AEP:
{ "pageName": "us:jiamlytics:article", "userId": 123523, "pageTitle": "Some Article", "registered": true }
And validates it against criteria (below is a schema I generated here):
{ "$schema": "http://json-schema.org/draft-07/schema#", "title": "Generated schema for Root", "type": "object", "properties": { "pageName": { "type": "string" }, "userId": { "type": "number" }, "pageTitle": { "type": "string" }, "registered": { "type": "boolean" } }, "required": [ "pageName", "userId" ] }
There are a few important pieces here. Ignore the title and $schema objects in that code. The rest is the (simplified) blueprint for your data! There’s a field for type so you can specify whether the data should come in as a number, string, boolean, etc. There’s also the required array which specifies which fields are required from the JSON object. In this example, it’s just the pageName and userId. Let’s recreate this in AEP’s schema interface:
This is literally just an interface built around that JSON schema above (with a few more optional parameters). Someone from Adobe will certainly chime in in the comments and say I’m over-simplifying it – and I am; but the point of the article isn’t to go into every little tiny bit of nuance here. That’s the basic of what the schema is – but let’s go one layer down.
What is a Class?
A Class will tell you whether the dataset is for descriptors of stuff (Records) or a log of when stuff happens (Time-Series). So a Record schema might be the entire database of products I sell – like a shoe. A Time-Series schema would measure each time someone bought a shoe. You can see on the left-hand-side the difference between the 2 is that the Time-Series class has an eventType and a timestamp. These wouldn’t make sense for a product database… because there shouldn’t be any actual events happening (like a purchase).
Now classes can be MORE than just a classification of the schema. You can actually add Fields to the class – so when someone selects that Class, it will come prepopulated with fields for a specific purpose. For instance, if we have multiple product databases for multiple websites, I might want to use a similar schema foundation – like, it will obviously have a SKU, Price, and Name. So I’ll add those 3 fields to the Class. Now every time I want to create a schema for a new set of products, I don’t have to worry about adding those fields. But what if one store is selling furniture and the other is selling apparel? Well, I’d need to add some fields to the schema that might not have existed on the furniture store’s schema. That’s where Field Groups come in.
What are Field Groups?
In AEP, Field Groups (formerly Mixins) are… groups… of fields. Super helpful, huh? Well, that’s exactly what they are. They’re collections of fields that can be added to a Class. Let’s say I want to extend our basic Product Record Class to include clothing sizes and material. I’d do that with Field Groups! Field Groups extend classes so you can add ad-hoc stuff like author and article information for publishing sites. That information might not be available in a default Time-Series schema so it would make sense to add that Field Group in. You might save that Field Group as Content Article Information and reuse it in another Time-Series schema. So, like the thing I built in the image above, anyone can now go in and select that set of Field Groups to use in their schema.
What are Data Types?
Frederik Werner reminded me that there is one more tool in the box to help build schemas – and that’s the Data Type. What is a Data Type? Well, it’s like a schema template that lives outside of Field Groups. So instead of selecting a Field Group from the left-hand-side, I would select this schema from the Field Type dropdown. You can see an example of what that looks like below:
This is just another way to do the same thing we’ve been doing with Field Groups. The biggest difference is you can create Data Types without having to first create a Schema – something I REALLY wish I could do with Field Groups.
Should I use a Class, a Field Group, or a Data Type?
Okay, so we get the difference between a Class, a Field Group, and a Data Type… I hope. Now you might ask: Jim, when do I add fields to a Class vs. adding fields via Field Groups vs. adding Data Types? That’s a good question! Thanks for asking. TL;DR: You technically have to use both a Class and a Field Group. You have to select a class so AEP knows whether to process data as just Records or a Time-Series. For all of your customizations and templates, I would use Field Groups instead of customizing Classes or using a Data Type. There’s a time and a place for creating them (I think?); but in my opinion, most implementations should have 2 Classes. One for Records and one for Time-Series. People shouldn’t have to use Data Types. This is probably a pretty hot take.
If you’re a developer, you might say that you could have a bunch of consistent templates of data and having them separated in Classes vs. Field Groups give a stronger feeling of governance; but that’s just it. In theory, it will ensure that you don’t deviate from some basic level of governance – some foundational structure. The downside is that now we have to memorize 2 different concepts. The entire architecture of XDM/Schemas is based on Classes and Mixins. If you’re a developer, this makes TOTAL sense! However, if you’ve spent your career in analytics, you might wonder why some things are the way that they are. As an analyst, I’m not fond of using developer-centric language because it pulls our role further from the business. With that said, my preference is to leverage Field Groups and just use a Class to specify the type of schema.
In my opinion, it might make more sense to get rid of the idea of a “Class” and “Data Type” altogether and just let users specify the schema Type (Record or Time-Series). Then generate schema templates with Field Groups. That would mitigate different design patterns and confusion.
Final Thoughts
There are a few other areas like Identities and Relationships that have less to do with schemas a more to do with how data interacts within AEP. If you get JSON Schemas, you’ll get AEP Schemas. I think there’s a lot of opportunity for Adobe to streamline the schema building process. There’s opportunity to craft an experience that uses analyst language more than development language. Sometimes it feels like it was designed by developers… and one HUGE value proposition (and opportunity) of AEP is its ecosystem is so local.
For instance, I don’t have to ingest data with GTM then go to Google Analytics then pipe it into a BigQuery table then use Looker Studio to model it. All of the tools I need live in AEP – and I know that’s a perception thing. You still have to take similar steps. But there’s power in being able to say you know AEP vs. having to list every single tool in Google Cloud individually. That’s a thought exercise for a different article, though.
Up next we’ll talk about Fields and go into all the gory details about each of those options you can toggle. Hope this one on AEP Schemas was useful!
2 thoughts on “Adobe AEP Guide: Schemas”