✂Text Splitter
Loading...
Easily split CSV data, log files, lists, and any other text by your choice of delimiter. Filter and sort the split items, remove duplicates, convert to a JSON array, or rejoin with a different delimiter. All processing runs in your browser — no data is sent to a server.
How to Use
- 1Enter text
Paste or type the text you want to split. Use the example buttons to quickly load sample data.
- 2Choose delimiter
Select from newline, comma, semicolon, space, tab, or pipe — or enter a custom delimiter.
- 3Configure options
Toggle trim whitespace, remove empty items, and deduplicate as needed.
- 4Use the results
Browse the split item list, search to filter, and sort. Copy as a joined string or as a JSON array.
Tips
- 💡Split a CSV row by comma to quickly inspect individual column values.
- 💡Use deduplicate to get a unique values list in one click.
- 💡Copy the JSON array output and paste it directly into your code.
- 💡Use the filter search to find items containing a specific keyword.
FAQ
- Q. Why is text splitting important for LLMs?
- A. Large Language Models (LLMs) have a fixed context window — a maximum number of tokens they can process at once. For RAG (Retrieval-Augmented Generation) systems, long documents must be split into chunks that fit within this limit.
- Q. How do I choose the right chunk size?
- A. Typical chunk sizes range from 256 to 1024 tokens. Smaller chunks are more precise for retrieval but lose context; larger chunks preserve context but may include irrelevant content. Experiment based on your document type and LLM.
- Q. What is chunk overlap and why does it matter?
- A. Overlap is the number of characters or tokens shared between consecutive chunks. It prevents key information from being cut off at a boundary, improving the quality of retrieved context in RAG pipelines.