Developing for and maintaining search platform tools requires understanding the topics below:
- Elasticsearch — Elastic fundamentals | Elastic Docs
- Python/Django Framework — Django
- Django REST Framework — Django REST framework
- Redis — https://redis.io/
- MySQL — MySQL 8.0 Reference Manual — 1.2.1 What is MySQL?
- Amazon Web Services (AWS) Basics — Why AWS?
- Git Basics — Git — What is Git?
- Docker Basics — What is Docker?
- AWS Elastic Beanstalk — What is AWS Elastic Beanstalk?
- AWS Elastic Container Service (ECS) — Amazon ECS
- Elasticsearch server — Download Elasticsearch
- Elasticsearch requires an installed Java Development Kit (JDK). OpenJDK should work (e.g., via Homebrew).
- Elasticvue — Elasticvue
- Miniconda — Miniconda (Anaconda)
- PyCharm Professional — Download PyCharm (JetBrains)
- VS Code (works in a pinch) — https://code.visualstudio.com/Download
- Insomnia — The Collaborative API Development Platform
- Redis install (macOS) — https://redis.io/docs/latest/ or Redis downloads — Downloads Redis
- RedisInsight — Redis Insight (Free GUI & CLI Tool for Redis)
- MySQL Server — MySQL Community Downloads
- Sequel Ace — Sequel Ace
- DataGrip (if you already have the license) — DataGrip | JetBrains
- Windows option — MySQL Workbench
- AWS Command Line Interface (CLI) — AWS CLI
- AWS Elastic Beanstalk CLI (EB CLI) — AWS Elastic Beanstalk (EB CLI)
- Docker Desktop — Docker Desktop: The #1 Containerization Tool for Developers
- Node.js — https://nodejs.org/en/download
- Docusaurus — Installation | Docusaurus
Node can also be installed via Homebrew on Mac. Docusaurus can then be installed by navigating to the docs folder inside your PatentSearch-API clone and running:
npm installThe required Node packages (including Docusaurus) will be detected from package.json.
- Elasticsearch local installation (quickstart) — Local development installation (quickstart) | Elastic Docs
- Elasticvue usage — https://elasticvue.com/usage
- Django — Getting started with Django
- Django project setup
- Django REST Framework (DRF) Quickstart — Quickstart | Django REST framework
- DRF In-depth Tutorial — 1 - Serialization | Django REST framework
- Pytest — Get Started — pytest documentation
- Learn Git Branching — Learn Git Branching
- GitHub onboarding — Start your journey | GitHub Docs
- Docker 101 Tutorial — Docker 101 Tutorial | Docker
- Deploy Django to Elastic Beanstalk — Deploying a Django application to Elastic Beanstalk | AWS Elastic Beanstalk
- Elasticsearch Schema/Mapping Definition — Mapping | Elastic Docs
Django Rest Framework offers multiple levels of abstraction for serializer and view design. For serializers, these are (listed from more explicit to more abstract)
- Explicitly designed and declared
Serializer - ModelSerializer
Similarly, for views there are several levels of abstractions (listed from more explicit to more abstract)
- Standalone (decorated) Python functions serving various requests (GET, POST, etc.)
- Class-based views where the class maps to a URL and methods in the class serve different requests
- Class-based views with DRF Mixins
- Generic class-based views
The PatentSearch API uses Explicit serializer and Class-based views with custom mixins.
There are multiple groups of elements in an API’s configuration that need to be aligned in non-obvious ways.
(All paths referenced originate at the root of the PatentSearch-API repo.)
By convention this should match the response name where possible, but this is not a strict requirement.
- The name (or an alias) for the Elasticsearch index that stores the data for the endpoint
- The string value assigned to
self.indexin the__init__method of the{entity}Endpointclass in: API/endpoints/{entity}_endpoint_configuration.py- The final named argument of the
__init__method of the{entity}ResponseDocumentclass in: API/endpoints/{entity}_endpoint_configuration.py
By convention this is the plural form of the endpoint path, but this is not a strict requirement.
- The name of the object attribute assigned in the
__init__method of the{entity}ResponseDocumentclass in:API/endpoints/{entity}_endpoint_configuration.py
- The first key under
propertiesunder{entity}SuccessResponsein:API/static/openapi.json
- The variable name assigned to the
{entity}Serializerobject in theAPISerializerclass in:API/serializers/APISerializer.py
By convention this is the word for a single entity within the endpoint/index (e.g., inventor).
- The string used as the first argument of the
pathorre_pathfunction call within theurlpatternslist in:API/urls.py
- The last portion of the path used as a key for that endpoint under
pathsin:API/static/openapi.jsonappended to the standard path prefix:/api/v1/,/api/v1/patent/, or/api/v1/publication/(corresponds to the Swagger page)
- The variable names in the class
{entity}Serializerin:API/endpoints/{entity}_endpoint_configuration.py
- The keys within:
components→schemas→{entity}SuccessResponse→properties→{ResponseName}→items→propertiesinAPI/static/openapi.json
- The GET defaults located at:
paths→/api/v1/{EndpointPath}→get→parameters→schema→defaultinAPI/static/openapi.json(both for thefandsparameters)
- The POST defaults located at:
components→schemas→{entity}PostRequestBody→properties→f/s→defaultinAPI/static/openapi.json(both for thefandsparameters)
- The field names in the Elasticsearch index that stores the data for the endpoint
This tutorial walks an onboarding developer step-by-step through creating an endpoint in the PatentSearch-API.
- Ensure local dependencies are available (MySQL, Elasticsearch, Redis as needed, and project dependencies installed).
- Identify the source table(s) and the target Elasticsearch index for the new endpoint.
- Connect to your local MySQL instance.
- Create (or identify) a database and table(s) that will act as the source for the endpoint.
- Verify the table(s) contain the expected columns and sample data.
- Create an Elasticsearch index schema file (JSON) with fields for all columns required by the endpoint. (See Elasticsearch mapping tutorials.)
- Create a data loading project/folder in PyCharm/VS Code.
- Install the
es-data-loadpackage. - Using
es-data-load, create the target index in your local Elasticsearch instance. - Connect using Elasticvue and verify the index exists and has the expected mapping.
- Create a MySQL → Elasticsearch mapping file.
- Run the data load using the
es-data-loadpackage. - Verify documents exist in the index (via Elasticvue or Elasticsearch queries).
Create a serializer class that inherits from serializers.Serializer:
- DRF tutorial: Tutorial 1 - Serialization
Example:
from rest_framework import serializers
class {entity_name}Serializer(serializers.Serializer):
# Declare the fields included in the endpoint, e.g.:
# id = serializers.CharField()
# name = serializers.CharField()
passCreate a generic class to define endpoint configuration and response document structure.
Replace
{entity_name}with the endpoint/entity name (e.g.,patent,assignee,inventor, etc.) as it applies.
# Defines the wrapper for a list of response documents
class {entity_name}ResponseDocument(APIResponseDocument):
def __init__(self, error, count, total_hits, {entity_name}):
super().__init__(error, count, total_hits)
self.{entity_name} = {entity_name}
# Fields, operators, Elasticsearch, and other configuration for the endpoint
class {entity_name}Endpoint:
def __init__(self, **kwargs):
# Elasticsearch Index
self.index = "{entity_name}"
# Fields which are configured as "text" type in ES and consequently require
# ".keyword" suffix for keyword-like operations
self.keyword_field_translations = []
# Default f and s fields
self.f = ["id"]
self.s = [{"id": "asc"}]
# List of all allowed fields
self.field_list = list({entity_name}Serializer.__dict__["_declared_fields"].keys())
# Wrapper that defines the format for API response
self.response_encoder = {entity_name}ResponseDocumentCreate DRF view classes (one for list view and one for detail view):
# DRF View for showing multiple entities
class {entity_name}List(PVAPIListView, {entity_name}Endpoint):
def __init__(self, **kwargs):
super().__init__(**kwargs)
{entity_name}Endpoint.__init__(self)
# DRF View for showing single entity
class {entity_name}Detail(PVAPIDetailView, {entity_name}Endpoint):
def __init__(self, **kwargs):
super().__init__(**kwargs)
{entity_name}Endpoint.__init__(self)
self.pk_field = "id"Add URL patterns to API/urls.py:
from django.urls import path, re_path
urlpatterns = [
path("{entity_name}/<str:pk>/", {entity_name}Detail.as_view(), name="{entity_name}-detail"),
re_path(r"^{entity_name}/?$", {entity_name}List.as_view(), name="{entity_name}-list"),
]