PDB ID Extension FAQ
Frequently Asked Questions
- What will happen to PDB format files once four-character PDB IDs have been consumed?
- PDB format files will not be provided for PDB entries deposited after four-character PDB IDs have been consumed. Best-effort PDB bundle files will no longer be provided for entries issued with extended PDB IDs.
- What is the format of extended PDB IDs?
- The format of extended PDB IDs is prefix “pdb_” followed by eight alphanumeric characters, e.g., pdb_10021abc. This PDB ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.
- How does one derive the new PDB IDs from old IDs?
- All existing four-character PDB IDs will be extended by adding prefixing “pdb_0000” to the IDs, e.g., PDB ID “1abc” would be listed as “pdb_00001abc” in the _database_2.pdbx_database_accession data item.
- Which part of extended PDB IDs should be cited in manuscripts or provided to journals?
- The entire PDB ID, e.g., “pdb_1001ba3c” should be cited or provided to journals. Users should not omit the prefix or zeros. Users or journals should parse/recognize PDB IDs using the prefix “pdb_”.
- How does OneDep assign extended PDB IDs?
- The wwPDB OneDep tool will assign extended PDB IDs starting with pdb_1xxxxxxx, followed by seven alphanumeric characters. When referencing PDB IDs in manuscripts that will be submitted to journals, authors should refrain from abbreviating prefix characters or omitting leading zeros in issued PDB IDs.
- Where is the extended PDB ID stored in a PDB entry?
- What will be the filenames in the PDB archive?
- After the depletion of four-character PDB IDs, filenames for both existing and new entries will be based on extended PDB IDs. For example, the current filename for PDB entry 1abc, which is 1abc.cif, will transform into pdb_00001abc.cif.
- How should journals manage PDB IDs within their workflows?
- wwPDB strongly encourages journals to swiftly transition to the new PDB ID format (“pdb_xxxxxxxx”). All existing PDB entries with four-character codes will have extended PDB IDs, e.g., the four-character PDB ID “1abc” will have the extended PDB ID, “pdb_00001abc”.
- How will data block names in PDBx/mmCIF files change?
- Data blocks will be named by appending the extended PDB ID to the data_ token, e.g., data_pdb_10021abc.
- How will data block names in structure factor files change?
- Data block names in structure factor files will follow the pattern data_sf_pdb_xxxxxxxx. In the case of multiple data blocks, they will be labeled as data_sf_pdb_xxxxxxxxA for the initial data block, data_sf_pdb_xxxxxxxxB for the second data block, and so forth.
- What is the regular expression for extended PDB IDs?
- The regular expression defined in the PDBx/mmCIF dictionary is pdb_[a-z0-9]{8}.
- Are there example files that can be accessed for software adoption?
- Is there going to be a change in the file directory architecture?
- All data files for a particular entry will be stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code. To aid users in adapting to this change, a PDB “beta” archive will be provided during the transition phase. The expected timeline of this beta archive is in 2026. The directory structure will mirror the data organization of the PDB Versioned Archive, i.e.,
https://files-beta.org/pub/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.
The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/.
This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.
- Will the already issued 4-character PDB IDs be kept in the files?
- Yes. The 4-character PDB IDs in the existing entries will be still available at _database_2.database_code in the mmCIF structure files.
- Will the existing PDB format files be retained in the “beta” archive?
- Yes. The existing legacy PDB format files will not be removed from the main PDB archive when “beta” archive becomes the main archive.