๐ Text Encryptor for Students โ An Expert Deep-Dive Into AES-256-GCM Encryption for Academic Privacy
A rigorous technical exploration of how the browser-based Text Encryptor implements military-grade encryption for academic and research data โ from the AES-256-GCM algorithm internals and PBKDF2 key derivation mathematics to GCM authentication proofs in GF(2^128) and practical integration patterns for research data protection, IRB compliance, and collaborative academic work.
๐ Open the Text Encryptor โ Free๐งฌ Cryptographic Architecture: How Browser-Based AES-256-GCM Encryption Actually Works
Most students encounter encryption through black-box interfaces โ password-protected PDFs, encrypted ZIP files, university VPN clients. The Text Encryptor takes a fundamentally different approach: it exposes the raw cryptographic primitive โ AES-256-GCM โ through the browser's built-in Web Crypto API, giving students a tool that is both transparent in its implementation and auditable in its security guarantees. For computer science and cryptography students, this is not just a utility โ it is a working reference implementation of the NIST SP 800-38D standard that can be inspected, verified, and understood at every layer.
The Four-Stage Encryption Pipeline
When a student enters research data โ survey responses, interview transcripts, experimental results โ and clicks Encrypt, a four-stage pipeline executes entirely within the browser's JavaScript runtime. Each stage involves specific cryptographic decisions that have direct implications for data confidentiality, integrity, and the security of the encrypted output in academic contexts.
Stage 1 โ Salt Generation and Passphrase Key Derivation via PBKDF2. The user's passphrase is not used directly as the AES-256 encryption key. The tool generates a cryptographically random 128-bit (16-byte) salt using crypto.getRandomValues(), which draws entropy from the operating system's CSPRNG. The salt and passphrase are fed into PBKDF2 with HMAC-SHA-256 as the pseudorandom function. The tool uses 100,000 iterations โ the minimum recommended by OWASP for passphrase-based key derivation as of 2023. PBKDF2's iteration count is the primary defense against brute-force attacks: each additional iteration doubles the computational cost for an attacker. At 100,000 iterations, a single passphrase guess takes approximately 50-100 milliseconds on a modern CPU โ slow enough to make brute-force attacks impractical against strong passphrases (60+ bits of entropy), fast enough to be imperceptible during legitimate encryption. The output is a 256-bit key โ exactly the key length required by AES-256.
The key is imported as extractable: false โ a critical security property. A non-extractable CryptoKey exists only as an opaque handle inside the browser's cryptographic subsystem. The raw key bytes can never be retrieved by JavaScript, never appear in memory in a format accessible to the page, and are destroyed when the CryptoKey object is garbage-collected. This means the encryption key exists only for the duration of the encrypt/decrypt operation and is cryptographically isolated from the rest of the page's JavaScript execution environment.
Stage 2 โ Initialization Vector (IV) Generation. The tool generates a cryptographically random 96-bit (12-byte) IV using crypto.getRandomValues(). The IV length of 96 bits is prescribed by NIST SP 800-38D for GCM mode โ it balances the probability of IV collision against the performance of the GHASH polynomial evaluation. With a 96-bit random IV, the probability of accidental IV reuse across 2^32 encryption operations (4.3 billion) is approximately 2^-32 โ effectively zero. IV reuse with the same key is catastrophic in GCM: it exposes the XOR difference between the two plaintexts and, with enough ciphertext, can enable full key recovery.
Stage 3 โ AES-256-GCM Encryption via SubtleCrypto.encrypt(). The 256-bit derived key, the 96-bit random IV, and the plaintext are passed to crypto.subtle.encrypt() with the algorithm identifier { name: 'AES-GCM', iv, tagLength: 128 }. The AES-GCM encryption operates in two parallel pipelines: the AES-CTR pipeline encrypts plaintext blocks to produce ciphertext, and the GHASH pipeline computes the 128-bit authentication tag. The tag is a MAC computed as a polynomial evaluation in GF(2^128) โ any modification to the ciphertext, even a single bit flip, produces a different tag value that is detected during decryption.
Stage 4 โ Output Encoding and Packaging. The raw binary output โ salt (16 bytes), IV (12 bytes), and ciphertext including the authentication tag โ is concatenated into a single ArrayBuffer and encoded as Base64 for transport. The Base64 encoding ensures the ciphertext passes safely through text-based channels: email bodies, messaging apps, Google Docs, cloud storage notes, and version control. The tool's output format is self-describing: the salt and IV are prepended to the ciphertext, so the decryption function can extract them directly without requiring separate out-of-band communication.
โ ๏ธ Critical Security Property: The IV and Salt Are Public
The salt and IV are prepended to the ciphertext in plaintext (they are not encrypted). This is cryptographically sound and standard practice โ the salt's purpose is to prevent rainbow table attacks (it must be unique, not secret), and the IV's purpose is to ensure ciphertext uniqueness per encryption operation (it must be unique, not secret). The security of the encrypted data rests entirely on the passphrase's entropy. Never rely on the secrecy of the salt or IV โ they are part of the ciphertext format, not part of the key material.
๐งฎ The GHASH Authentication Tag: A Polynomial in GF(2^128)
The GHASH function is the mathematical core of GCM's integrity guarantee. It operates in the finite field GF(2^128) defined by the irreducible polynomial P(x) = x^128 + x^7 + x^2 + x + 1. The authentication tag T is computed as:
T = AโยทHแต + AโยทHแตโปยน + ... + AโยทH + Eโ(N || 0ยณยน || 1)
Where H = Eโ(0ยนยฒโธ) is the hash subkey (AES encryption of a zero block under the main key), Aแตข are the 128-bit blocks of associated data and ciphertext, and Eโ(N || 0ยณยน || 1) is the encrypted nonce that masks the polynomial output. Any modification to the ciphertext changes one or more Aแตข blocks, which changes the polynomial evaluation result by a factor of H^(position) โ and since H is derived from the secret key, the attacker cannot predict how to modify the tag. The probability of an undetected random modification is 2^-128 โ for all practical purposes, zero.
For cryptography students: this is a concrete instantiation of the Wegman-Carter universal hash function family, applied in an Encrypt-then-MAC construction. GCM's polynomial evaluation in GF(2^128) is highly efficient in hardware (leveraging carry-less multiplication instructions like PCLMULQDQ on x86) but requires careful implementation in software to avoid timing side channels.
๐ฌ The PBKDF2 Key Derivation: Mathematics and Security Analysis
PBKDF2 (defined in RFC 8018 / PKCS #5 v2.1) is the industry-standard function for deriving cryptographic keys from passphrases. Its security model is based on computational hardness: it makes key derivation deliberately expensive to slow down brute-force attacks, while remaining fast enough (50-100ms) to be imperceptible during legitimate use.
How PBKDF2 Works Internally
PBKDF2 applies a pseudorandom function (PRF) โ HMAC-SHA-256 in this implementation โ repeatedly to the passphrase and salt. The derived key is computed as:
DK = Tโ || Tโ || ... || Tโ
Where each block Tแตข is computed through c iterations (c = 100,000) of the PRF:
Tแตข = Uโ โ Uโ โ ... โ U_c
With Uโ = PRF(Passphrase, Salt || INT(i)), and Uโฑผ = PRF(Passphrase, Uโฑผโโ) for j > 1. The XOR chain ensures that every iteration contributes to the final output โ an attacker cannot skip iterations without affecting the result. For a 256-bit key, only one block (Tโ) is needed, so n = 1, and the output is a single 32-byte block produced through 100,000 HMAC-SHA-256 computations.
Entropy Requirements for Student Passphrases
The security of the entire system rests on the passphrase's entropy. A passphrase with 20 bits of entropy (e.g., "student123") can be brute-forced in approximately 1,000,000 guesses. At 50ms per guess (PBKDF2 100K iterations), that is approximately 14 hours โ feasible for a determined attacker with a laptop. A passphrase with 60 bits of entropy (e.g., a random 8-word diceware phrase like "correct horse battery staple oxygen mountain river cloud") requires approximately 10^18 guesses โ at 50ms per guess, approximately 1.6 billion years. The difference between 20 bits and 60 bits of entropy is the difference between a lunch-break attack and thermodynamic impossibility.
Use the Password Generator to create research-grade passphrases. For IRB-governed data, generate a unique 20-character random passphrase for each research project. Document the passphrase generation method in your data management plan โ auditors and IRB reviewers appreciate evidence that passphrases were generated using a CSPRNG rather than chosen by a human. Store the passphrase separately from the encrypted data (e.g., in a university-approved password manager, never in the same cloud folder as the ciphertext).
๐ Practical Applications for Student Research and Academic Work
Application 1: Encrypting Survey Response Data for IRB Compliance
Problem: A graduate student collecting survey data with personally identifiable information (PII) needs to store and share responses with their advisor while maintaining IRB-mandated data protection. University cloud storage (Google Drive, OneDrive) is convenient but does not provide end-to-end encryption โ the university IT department and the cloud provider can technically access the data. The student needs content-level encryption that protects survey responses even if the cloud account is compromised.
Solution: The student exports survey responses as CSV or JSON, pastes the raw data into the Text Encryptor, encrypts with a project-specific passphrase, and stores the ciphertext in a university cloud folder. The passphrase is communicated to the advisor through a separate channel (in-person meeting, phone call). The advisor decrypts locally using the same tool and passphrase. The ciphertext in cloud storage is cryptographically useless without the passphrase, satisfying the IRB's data protection requirement. The student documents the encryption protocol โ AES-256-GCM, client-side processing, separate passphrase channel โ in the data management plan. For the IRB application, this protocol demonstrates that data is protected at the content level, not just at the transport level.
Application 2: Securing Collaborative Research Across Institutions
Problem: A multi-institution research collaboration shares preliminary findings, datasets, and analysis scripts through email and shared cloud folders. The collaboration involves four universities with different IT policies, different cloud platforms, and different security postures. Sensitive research data โ unpublished results, patentable discoveries, grant proposals under embargo โ travels through channels that none of the PIs fully control. A data leak before publication could compromise the research priority, and the decentralized infrastructure makes it impossible to enforce uniform access controls.
Solution: The research group establishes a collaboration passphrase at the project kickoff meeting. All sensitive research data โ preliminary analyses, draft manuscripts, unpublished datasets โ is encrypted through the Text Encryptor before sharing through any channel. The ciphertext is shared through email, Slack, or cloud storage as usual, but the content is cryptographically protected regardless of the transport channel or storage platform. The passphrase never travels through the same channels as the data. If any collaborator's email is compromised or cloud storage is misconfigured, the encrypted research data remains protected. The protocol adds approximately 15 seconds to each data-sharing action and requires zero new software โ every collaborator already has a browser. This is particularly valuable for cross-institution collaborations where imposing a uniform security infrastructure is politically and practically impossible.
Application 3: Protecting Student Journalism and Sensitive Interview Data
Problem: A student journalist or ethnographer collects sensitive interview data โ whistleblower accounts, vulnerable population narratives, politically sensitive testimony โ that must be protected from unauthorized access. The data is stored on a personal laptop that could be lost or stolen, and shared with editors or advisors through standard communication channels. Source confidentiality is an ethical and sometimes legal obligation, and a data breach could expose sources to retaliation or harm.
Solution: The student encrypts interview transcripts and notes through the Text Encryptor immediately after each interview session, before the data is stored on any device or shared with anyone. The encrypted ciphertext is saved locally and backed up โ the original plaintext is deleted after encryption. The passphrase is memorized or stored in a password manager protected by a strong master password. When the editor or advisor needs to review the material, the student decrypts it locally on their own device and shares the plaintext through a secure channel (in-person, encrypted messaging) โ or shares the ciphertext with the passphrase communicated through a separate channel. Even if the student's laptop is confiscated, stolen, or searched, the encrypted interview data is cryptographically protected. For journalism programs and ethnography departments, this workflow provides a lightweight, zero-cost method for meeting source-protection obligations that would otherwise require expensive secure communication infrastructure.
๐ก๏ธ Academic Threat Model: What the Text Encryptor Protects Against
A rigorous threat model is essential for academic researchers evaluating any encryption tool. Here is what the Text Encryptor's AES-256-GCM client-side encryption protects against in academic contexts, and what it explicitly does not protect against:
โ Protected Threats (Within the Tool's Security Model)
- Ciphertext interception in transit. An attacker who intercepts the Base64 ciphertext during email transmission, cloud sync, or messaging cannot recover the plaintext without the passphrase. AES-256-GCM encryption is computationally infeasible to break with current or foreseeable computing technology.
- Cloud storage compromise. If the student's Google Drive, OneDrive, or Dropbox account is compromised, the attacker gains only Base64 ciphertext โ useless without the separately managed passphrase. This is the key advantage of content-level encryption over platform-level access controls.
- Device loss or theft. If a laptop containing encrypted research data is lost or stolen, the data remains cryptographically protected. The attacker must brute-force the passphrase, which is infeasible for passphrases with 60+ bits of entropy given PBKDF2's 100,000 iterations.
- Ciphertext tampering. Any modification to the ciphertext โ accidental corruption during file transfer, intentional tampering, bit flips in storage โ is detected by GCM's authentication tag. The decryption fails rather than producing corrupted plaintext, preventing the use of tampered data in research analysis.
- University IT surveillance. Because encryption happens client-side, university network monitoring cannot inspect the plaintext content. Only Base64 ciphertext traverses the network.
โ Unprotected Threats (Outside the Tool's Security Model)
- Compromised local device with keyloggers or malware. If the student's device is compromised with a keylogger, the passphrase can be captured during entry. The tool encrypts data within the browser โ it cannot protect data before it enters the browser.
- Weak passphrase selection. A passphrase like "research2025" with ~30 bits of entropy can be brute-forced in approximately 15 hours with PBKDF2 at 100K iterations. Use the Password Generator to create passphrases with 80+ bits of entropy.
- Passphrase compromise through side channels. If the passphrase is stored in the same cloud folder as the ciphertext, written on a sticky note, or communicated through the same channel as the encrypted data, the encryption provides no protection.
- Shoulder surfing and physical observation. If someone observes the student typing the passphrase or reading the plaintext on screen, the encryption is bypassed. This is a physical security concern, not a cryptographic one.
โ ๏ธ Academic IRB and Ethics Board Considerations
The Text Encryptor provides the encryption layer for research data protection. It does not replace the full IRB data management plan, which must also address: informed consent describing data protection measures, data retention and destruction schedules, breach notification procedures, and the specific regulatory framework governing your research (HIPAA for health data, FERPA for educational records, GDPR for EU subjects). Document the Text Encryptor's AES-256-GCM architecture and client-side processing in your IRB application as the technical safeguard, and describe your passphrase management protocol as the administrative safeguard.
๐ Related Tools for Student Research and Academic Work
๐ Your Academic Data Protection Toolkit
- ๐ Text Encryptor โ The tool this deep-dive covers
- ๐ Password Generator โ Generate high-entropy passphrases for research data
- ๐ Password Strength Checker โ Verify passphrase entropy before use
- #๏ธโฃ Hash Generator โ Verify research data integrity with SHA-256
- ๐ Diff Checker โ Compare research data versions and track changes
- ๐ Lorem Ipsum Generator โ Generate placeholder text for academic document formatting
- ๐ ToolStand Blog โ Security, privacy, and academic productivity guides
โ Frequently Asked Questions
How does AES-256-GCM provide both confidentiality and integrity, and why does that matter for academic research data?
AES-256-GCM is an authenticated encryption with associated data (AEAD) algorithm โ it simultaneously provides confidentiality (the ciphertext cannot be read without the key) and integrity (any modification to the ciphertext is cryptographically detectable). This is achieved through GCM's dual pipeline: the AES-CTR mode encrypts the plaintext into ciphertext, while the GHASH function computes a 128-bit authentication tag over the ciphertext. During decryption, the tag is recomputed and compared โ if it doesn't match, decryption fails. For academic research, this is critical because data integrity is foundational to reproducible science: if research data is tampered with โ intentionally or accidentally โ the conclusions drawn from that data become unreliable. GCM's authenticated encryption guarantees that any modification to encrypted research data is detected before the data is used in analysis. This is the key advantage of GCM over older modes like CBC, which provide confidentiality but no integrity guarantee.
How does PBKDF2 derive the AES-256 key from a passphrase, and what makes 100,000 iterations secure?
PBKDF2 applies HMAC-SHA-256 repeatedly to the passphrase and a random 128-bit salt. The number of iterations (100,000) determines the computational cost: each iteration requires one HMAC-SHA-256 computation, and 100,000 iterations take approximately 50-100 milliseconds on a modern CPU. An attacker attempting to brute-force the passphrase must perform 100,000 HMAC-SHA-256 computations per guess. At 50ms per guess on equivalent hardware, a passphrase with 60 bits of entropy (approximately 10^18 possible values) would take approximately 1.6 billion years to exhaustively search. For students, this means that even a moderately strong passphrase โ a random 8-word diceware phrase provides ~62 bits โ is protected by the iteration cost against any practical brute-force attack. The random 128-bit salt ensures that two students using the same passphrase produce different derived keys, preventing pre-computed rainbow table attacks.
Can the Text Encryptor be used for encrypting IRB-governed research data and survey responses?
Yes, with important caveats. The Text Encryptor provides the encryption layer โ AES-256-GCM with client-side processing โ that satisfies the technical requirement for protecting sensitive research data. However, IRB compliance also requires: documented data handling procedures, informed consent describing data protection measures, secure passphrase management protocols, data retention and destruction policies, and breach notification procedures. Document the AES-256-GCM encryption standard and client-side architecture in your IRB application; describe the passphrase management protocol (how passphrases are generated, stored, and shared); and maintain records that encrypted data is protected even if cloud storage or email transmission is compromised. Many university IRBs appreciate the client-side architecture because research data never touches a third-party server, which simplifies the data security narrative.
What are the mathematical foundations of the GHASH authentication tag in GF(2^128)?
GHASH operates in the finite field GF(2^128) โ the Galois field of 2^128 elements โ defined by the irreducible polynomial P(x) = x^128 + x^7 + x^2 + x + 1. The authentication tag T is computed as T = AโยทH^m + AโยทH^(m-1) + ... + A_mยทH + E_k(N || 0^31 || 1), where H = E_k(0^128) is the hash subkey, A_i are the 128-bit blocks of associated data and ciphertext, and E_k(N || 0^31 || 1) is the encrypted nonce. Any modification to the ciphertext changes one or more A_i blocks, changing the polynomial evaluation by a factor of H^(position) โ and since H is derived from the secret key, the attacker cannot predict how to modify the tag. The probability of an undetected modification is 2^-128 โ cryptographically negligible. For students studying cryptography, this is a concrete instantiation of the Wegman-Carter universal hash function family in an Encrypt-then-MAC construction.
Is the Text Encryptor free for students and academic researchers?
Yes, completely free with no usage limits, no account required, and no premium tier โ for individual students, research groups, and academic institutions. All encryption computation happens in the user's browser, so there is no per-user infrastructure cost. Students can use it throughout their academic career without ever paying or creating an account. The tool is accessible from university computers, library terminals, and personal devices. For CS and cryptography courses, the transparent client-side implementation (inspectable via browser developer tools) makes it valuable as a teaching tool: students can examine the Web Crypto API calls, verify PBKDF2 parameters, and confirm that encryption follows the NIST SP 800-38D specification. The tool is supported by non-intrusive advertising, keeping it accessible for the academic community regardless of institutional budget.