mcs07 · mcs07 · Aug 8, 2025 · Aug 8, 2025 · Aug 8, 2025 · Aug 8, 2025
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 PubChemPy provides a way to interact with PubChem in Python. It allows chemical searches by name, substructure and similarity, chemical standardization, conversion between chemical file formats, depiction and retrieval of chemical properties.
 
-```python
+```pycon
 >>> from pubchempy import get_compounds, Compound
 >>> comp = Compound.from_cid(1423)
 >>> print(comp.smiles)

diff --git a/docs/guide/advanced.md b/docs/guide/advanced.md
@@ -1,6 +1,6 @@
 (advanced)=
 
-# Advanced Usage
+# Advanced usage
 
 This guide covers advanced PubChemPy usage patterns, API best practices, error handling, logging, and low-level request functions.
 
@@ -13,16 +13,16 @@ If there are too many results for a request, you will receive a TimeoutError. Th
 If retrieving full compound or substance records, instead request a list of cids or sids for your input, and then request the full records for those identifiers individually or in small groups. For example:
 
 ```python
-sids = get_sids('Aspirin', 'name')
+sids = get_sids("Aspirin", "name")
 for sid in sids:
     s = Substance.from_sid(sid)
 ```
 
 When using the `formula` namespace or a `searchtype`, you can also alternatively use the `listkey_count` and `listkey_start` keyword arguments to specify pagination. The `listkey_count` value specifies the number of results per page, and the `listkey_start` value specifies which page to return. For example:
 
 ```python
-get_compounds('CC', 'smiles', searchtype='substructure', listkey_count=5)
-get('C10H21N', 'formula', listkey_count=3, listkey_start=6)
+get_compounds("CC", "smiles", searchtype="substructure", listkey_count=5)
+get("C10H21N", "formula", listkey_count=3, listkey_start=6)
 ```
 
 ## Logging
@@ -61,8 +61,8 @@ A simple fix is to specify the proxy information via urllib:
 ```python
 import urllib
 proxy_support = urllib.request.ProxyHandler({
-    'http': 'http://<proxy.address>:<port>',
-    'https': 'https://<proxy.address>:<port>'
+    "http": "http://<proxy.address>:<port>",
+    "https": "https://<proxy.address>:<port>"
 })
 opener = urllib.request.build_opener(proxy_support)
 urllib.request.install_opener(opener)

diff --git a/docs/guide/compound.md b/docs/guide/compound.md
@@ -1,6 +1,6 @@
 (compound)=
 
-# Compound
+# Compounds
 
 The {func}`~pubchempy.get_compounds` function returns a list of {class}`~pubchempy.Compound` objects. You can also instantiate a {class}`~pubchempy.Compound` object directly if you know its CID:
 
@@ -14,9 +14,9 @@ Each {class}`~pubchempy.Compound` has a `record` property, which is a dictionary
 
 Additionally, each {class}`~pubchempy.Compound` provides a {meth}`~pubchempy.Compound.to_dict` method that returns PubChemPy's own dictionary representation of the Compound data. As well as being more concisely formatted than the raw `record`, this method also takes an optional parameter to filter the list of the desired properties:
 
-```python
+```pycon
 >>> c = pcp.Compound.from_cid(962)
->>> c.to_dict(properties=['atoms', 'bonds', 'inchi'])
+>>> c.to_dict(properties=["atoms", "bonds", "inchi"])
 {'atoms': [{'aid': 1, 'element': 'o', 'x': 2.5369, 'y': -0.155},
            {'aid': 2, 'element': 'h', 'x': 3.0739, 'y': 0.155},
            {'aid': 3, 'element': 'h', 'x': 2, 'y': 0.155}],
@@ -25,7 +25,13 @@ Additionally, each {class}`~pubchempy.Compound` provides a {meth}`~pubchempy.Com
  'inchi': u'InChI=1S/H2O/h1H2'}
 ```
 
-## 3D Compounds
+## 3D compounds
+
+By default, compounds are returned with 2D coordinates. Use the `record_type` keyword argument to specify otherwise:
+
+```python
+pcp.get_compounds("Aspirin", "name", record_type="3d")
+```
 
 Many properties are missing from 3D records, and the following properties are *only* available on 3D records:
 

diff --git a/docs/guide/contribute.md b/docs/guide/contribute.md
@@ -1,6 +1,6 @@
 (contribute)=
 
-# Contribute
+# Contributing
 
 The [Issue Tracker] is the best place to post any feature ideas, requests and bug reports.
 

diff --git a/docs/guide/download.md b/docs/guide/download.md
@@ -7,8 +7,8 @@ The {func}`~pubchempy.download` function is for saving a file to disk. The follo
 Examples:
 
 ```python
-pcp.download('PNG', 'asp.png', 'Aspirin', 'name')
-pcp.download('CSV', 's.csv', [1,2,3], operation='property/ConnectivitySMILES,SMILES')
+pcp.download("PNG", "asp.png", "Aspirin", "name")
+pcp.download("CSV", "s.csv", [1,2,3], operation="property/ConnectivitySMILES,SMILES")
 ```
 
 For PNG images, the `image_size` argument can be used to specify `large`, `small`

diff --git a/docs/guide/gettingstarted.md b/docs/guide/gettingstarted.md
@@ -10,19 +10,19 @@ Retrieving information about a specific Compound in the PubChem database is simp
 
 Begin by importing PubChemPy:
 
-```python
+```pycon
 >>> import pubchempy as pcp
 ```
 
 Let's get the {class}`~pubchempy.Compound` with [CID 5090]:
 
-```python
+```pycon
 >>> c = pcp.Compound.from_cid(5090)
 ```
 
 Now we have a {class}`~pubchempy.Compound` object called `c`. We can get all the information we need from this object:
 
-```python
+```pycon
 >>> print(c.molecular_formula)
 C17H14O4S
 >>> print(c.molecular_weight)
@@ -43,34 +43,42 @@ All the code examples in this documentation will assume you have imported PubChe
 ```python
 from pubchempy import Compound, get_compounds
 c = Compound.from_cid(1423)
-cs = get_compounds('Aspirin', 'name')
+cs = get_compounds("Aspirin", "name")
 ```
 ````
 
 ## Searching
 
-What if you don't know the PubChem CID of the Compound you want? Just use the {func}`~pubchempy.get_compounds` function:
+What if you don't know the PubChem CID of the Compound you want? Just use the {func}`~pubchempy.get_compounds` function, for example with a compound name input:
 
-```python
->>> results = pcp.get_compounds('Glucose', 'name')
+```pycon
+>>> results = pcp.get_compounds("Glucose", "name")
 >>> print(results)
 [Compound(5793)]
 ```
 
-The first argument is the identifier, and the second argument is the identifier type, which must be one of `name`, `smiles`, `sdf`, `inchi`, `inchikey` or `formula`. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let's take a look at them in more detail:
+The first argument is the identifier, and the second argument is the identifier type, which must be one of `name`, `smiles`, `sdf`, `inchi`, `inchikey` or `formula`. More often than not, only a single result will be returned, but sometimes there are multiple results for a given identifier. Therefore, {func}`~pubchempy.get_compounds` returns a list of {class}`~pubchempy.Compound` objects (even if there is only one result).
 
-```python
+It is possible to iterate over this list to get the individual {class}`~pubchempy.Compound` objects:
+
+```pycon
 >>> for compound in results:
 ...    print(compound.smiles)
 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
 ```
 
-It looks like they all have different stereochemistry information.
+Or you can access the first result directly:
 
-Retrieving the record for a SMILES string is just as easy:
+```pycon
+>>> compound = results[0]
+>>> print(compound.smiles)
+C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
+```
 
-```python
->>> pcp.get_compounds('C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1', 'smiles')
+Retrieving the compound record(s) for a SMILES input is just as easy:
+
+```pycon
+>>> pcp.get_compounds("C1=CC2=C(C3=C(C=CC=N3)C=C2)N=C1", "smiles")
 [Compound(1318)]
 ```
 

diff --git a/docs/guide/introduction.md b/docs/guide/introduction.md
@@ -8,12 +8,7 @@ PubChemPy relies entirely on the PubChem database and chemical toolkits provided
 
 This is important to remember when using PubChemPy: Every request you make is transmitted to the PubChem servers, evaluated, and then a response is sent back. There are some downsides to this: It is less suitable for confidential work, it requires a constant internet connection, and some tasks will be slower than if they were performed locally on your own computer. On the other hand, this means we have the vast resources of the PubChem database and chemical toolkits at our disposal. As a result, it is possible to do complex similarity and substructure searching against a database containing tens of millions of compounds in seconds, without needing any of the storage space or computational power on your own local computer.
 
-## The PUG REST web service
-
-You don't need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But if you want to go beyond the capabilities of PubChemPy, there is some helpful documentation on the PubChem website.
-
-- [PUG REST Tutorial]: Explains how the web service works with a variety of usage examples.
-- [PUG REST Specification]: A more comprehensive but dense specification that details every possible way to use the web service.
+See the {doc}`pugrest` page for more information about how PubChemPy uses the PubChem web service.
 
 ## PubChemPy license
 
@@ -27,6 +22,4 @@ You don't need to worry too much about how the PubChem web service works, becaus
 [^f1]: That's a lot of acronyms! PUG stands for "Power User Gateway", a term used to describe a variety of methods for programmatic access to PubChem data and services. REST stands for [Representational State Transfer], which describes the specific architectural style of the web service.
 
 [pubchem website]: https://pubchem.ncbi.nlm.nih.gov
-[pug rest specification]: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest
-[pug rest tutorial]: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial
 [representational state transfer]: https://en.wikipedia.org/wiki/Representational_state_transfer
diff --git a/docs/guide/pandas.md b/docs/guide/pandas.md
@@ -2,31 +2,29 @@
 
 # *pandas* integration
 
-## Getting *pandas*
+## Installing *pandas*
 
-*pandas* must be installed to use its functionality from within PubChemPy. The easiest way is to use pip:
+*pandas* must be installed to use its functionality from within PubChemPy. It is an optional dependency, so it is not installed automatically with PubChemPy. The easiest way is to use pip:
 
 ```bash
 pip install pandas
 ```
 
-See the [pandas documentation] for more information.
+See the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/) for more information.
 
 ## Usage
 
 It is possible for {func}`~pubchempy.get_compounds`, {func}`~pubchempy.get_substances` and {func}`~pubchempy.get_properties` to return a pandas DataFrame:
 
 ```python
-df1 = pcp.get_compounds('C20H41Br', 'formula', as_dataframe=True)
+df1 = pcp.get_compounds("C20H41Br", "formula", as_dataframe=True)
 df2 = pcp.get_substances([1, 2, 3, 4], as_dataframe=True)
-df3 = pcp.get_properties(['smiles', 'xlogp', 'rotatable_bond_count'], 'C20H41Br', 'formula', as_dataframe=True)
+df3 = pcp.get_properties(["smiles", "xlogp", "rotatable_bond_count"], "C20H41Br", "formula", as_dataframe=True)
 ```
 
 An existing list of {class}`~pubchempy.Compound` objects can be converted into a dataframe, optionally specifying the desired columns:
 
 ```python
-cs = pcp.get_compounds('C20H41Br', 'formula')
-df4 = pcp.compounds_to_frame(cs, properties=['smiles', 'xlogp', 'rotatable_bond_count'])
+cs = pcp.get_compounds("C20H41Br", "formula")
+df4 = pcp.compounds_to_frame(cs, properties=["smiles", "xlogp", "rotatable_bond_count"])
 ```
-
-[pandas documentation]: https://pandas.pydata.org/pandas-docs/stable/
diff --git a/docs/guide/properties.md b/docs/guide/properties.md
@@ -5,7 +5,7 @@
 The {func}`~pubchempy.get_properties` function allows the retrieval of specific properties without having to deal with entire compound records. This is especially useful for retrieving the properties of a large number of compounds at once:
 
 ```python
-p = pcp.get_properties('SMILES', 'CC', 'smiles', searchtype='superstructure')
+p = pcp.get_properties("SMILES", "CC", "smiles", searchtype="superstructure")
 ```
 
 Multiple properties may be specified in a list, or in a comma-separated string. The available properties are: MolecularFormula, MolecularWeight, ConnectivitySMILES, SMILES, InChI, InChIKey, IUPACName, XLogP, ExactMass, MonoisotopicMass, TPSA, Complexity, Charge, HBondDonorCount, HBondAcceptorCount, RotatableBondCount, HeavyAtomCount, IsotopeAtomCount, AtomStereoCount, DefinedAtomStereoCount, UndefinedAtomStereoCount, BondStereoCount, DefinedBondStereoCount, UndefinedBondStereoCount, CovalentUnitCount, Volume3D, XStericQuadrupole3D, YStericQuadrupole3D, ZStericQuadrupole3D, FeatureCount3D, FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D, FeatureHydrophobeCount3D, ConformerModelRMSD3D, EffectiveRotorCount3D, ConformerCount3D.
@@ -15,8 +15,8 @@ Multiple properties may be specified in a list, or in a comma-separated string.
 Get a list of synonyms for a given input using the {func}`~pubchempy.get_synonyms` function:
 
 ```python
-pcp.get_synonyms('Aspirin', 'name')
-pcp.get_synonyms('Aspirin', 'name', 'substance')
+pcp.get_synonyms("Aspirin", "name")
+pcp.get_synonyms("Aspirin", "name", "substance")
 ```
 
 Inputs that match more than one SID/CID will have multiple, separate synonyms lists returned.
@@ -26,14 +26,14 @@ Inputs that match more than one SID/CID will have multiple, separate synonyms li
 CAS Registry Numbers are not officially supported by PubChem, but they are often present in the synonyms associated with a compound. Therefore it is straightforward to retrieve them by filtering the synonyms to just those with the CAS Registry Number format:
 
 ```python
-for result in pcp.get_synonyms('Aspirin', 'name'):
-    cid = result['CID']
+for result in pcp.get_synonyms("Aspirin", "name"):
+    cid = result["CID"]
     cas_rns = []
-    for syn in result.get('Synonym', []):
-        match = re.match(r'(\d{2,7}-\d\d-\d)', syn)
+    for syn in result.get("Synonym", []):
+        match = re.match(r"(\d{2,7}-\d\d-\d)", syn)
         if match:
             cas_rns.append(match.group(1))
-    print(f'CAS registry numbers for CID {cid}: {cas_rns}')
+    print(f"CAS registry numbers for CID {cid}: {cas_rns}")
 ```
 
 ## Identifiers

diff --git a/docs/guide/pugrest.md b/docs/guide/pugrest.md
@@ -0,0 +1,44 @@
+(pugrest)=
+
+# PUG REST
+
+PUG (Power User Gateway) REST is a web service that PubChem provides for programmatic access to its data. PubChemPy uses this web service to interact with the PubChem database, allowing you to search for compounds, substances, and assays, retrieve their properties, and perform various operations without needing to download or store large datasets locally.
+
+You don't need to worry too much about how the PubChem web service works, because PubChemPy handles all of the details for you. But understanding the underlying architecture can help you use PubChemPy more effectively and troubleshoot issues.
+
+## PUG REST architecture
+
+The PUG REST API is built around a three-part request pattern:
+
+1. **Input**: Specifies which records you're interested in (by CID, name, SMILES, etc.)
+2. **Operation**: Defines what to do with those records (retrieve properties, search, etc.)
+3. **Output**: Determines the format of the returned data (JSON, XML, CSV, etc.)
+
+This modular design allows for flexible combinations. For example, you can combine structure input via SMILES with property retrieval operations and CSV output - all handled seamlessly by PubChemPy.
+
+## Request flow
+
+When you make a request with PubChemPy:
+
+1. Your Python request is translated into a PUG REST URL (and possibly some POST data).
+2. The request is sent to PubChem's servers via HTTPS.
+3. PubChem processes the request using their chemical databases and toolkits.
+4. Results are returned and parsed by PubChemPy into Python objects.
+
+PubChem contains over 300 million substance records, over 100 million standardized compound records, and over 1 million biological assays. All this data may be accessed and processed through PubChemPy without requiring local storage or computational resources.
+
+## When to use alternatives
+
+While PubChemPy and PUG REST are excellent for many tasks, consider alternatives for:
+
+- **Bulk data processing**: Use PubChem's bulk download services for large datasets
+- **Confidential work**: Consider local chemical toolkits for sensitive data
+- **Offline work**: The PUG REST API requires an internet connection
+
+## Further reading
+
+If you want to go beyond the capabilities of PubChemPy, there is helpful documentation about programmatic access to PubChem data on the PubChem website:
+
+- [Programmatic Access to PubChem](https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access): Overview of how to access PubChem data programmatically.
+- [PUG REST Tutorial](https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest): Explains how the web service works with a variety of usage examples.
+- [PUG REST Specification](https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest-tutorial): A more comprehensive but dense specification that details every possible way to use the web service.
diff --git a/docs/guide/searching.md b/docs/guide/searching.md
@@ -2,20 +2,12 @@
 
 # Searching
 
-## 2D and 3D coordinates
+PubChemPy provides powerful search capabilities that leverage PubChem's extensive chemical databases. Understanding the different search types and their performance characteristics can help you choose the most efficient approach for your needs.
 
-By default, compounds are returned with 2D coordinates. Use the `record_type` keyword argument to specify otherwise:
+By default, requests look for an exact match with the input. Alternatively, you can specify a search type using the `searchtype` parameter to perform chemical substructure, superstructure, similarity, or identity searches.
 
 ```python
-pcp.get_compounds('Aspirin', 'name', record_type='3d')
-```
-
-## Advanced search types
-
-By default, requests look for an exact match with the input. Alternatively, you can specify substructure, superstructure, similarity and identity searches using the `searchtype` keyword argument:
-
-```python
-pcp.get_compounds('CC', 'smiles', searchtype='superstructure', listkey_count=3)
+pcp.get_compounds("CC", "smiles", searchtype="superstructure", listkey_count=3)
 ```
 
 The `listkey_count` and `listkey_start` arguments can be used for pagination. Each `searchtype` has its own options that can be specified as keyword arguments. For example, similarity searches have a `Threshold`, and super/substructure searches have `MatchIsotopes`. A full list of options is available in the [PUG REST Specification].
@@ -31,7 +23,7 @@ Unfortunately it isn't directly possible to return to the previous behaviour, bu
 There area a few different ways you can do this using PubChemPy, but the easiest is probably using the {func}`~pubchempy.get_cids` function:
 
 > ```pycon
-> >>> pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat')
+> >>> pcp.get_cids("2-nonenal", "name", "substance", list_return="flat")
 > [17166, 5283335, 5354833]
 > ```
 
@@ -40,7 +32,7 @@ This searches the substance database for '2-nonenal', and gets the CID for the c
 You can then use {meth}`~pubchempy.Compound.from_cid` to get the full {class}`~pubchempy.Compound` record, equivalent to what is returned by {func}`~pubchempy.get_compounds`:
 
 > ```pycon
-> >>> cids = pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat')
+> >>> cids = pcp.get_cids("2-nonenal", "name", "substance", list_return="flat")
 > >>> [pcp.Compound.from_cid(cid) for cid in cids]
 > [Compound(17166), Compound(5283335), Compound(5354833)]
 > ```