Giter Site home page Giter Site logo

jonschlinkert / split-string Goto Github PK

View Code? Open in Web Editor NEW
57.0 4.0 17.0 78 KB

Split a string on a given character or characters, with support for escaping.

Home Page: https://github.com/jonschlinkert

License: MIT License

JavaScript 94.13% TypeScript 5.87%
string split util javascript nodejs node js jonschlinkert split-string string-split

split-string's Introduction

split-string NPM version NPM monthly downloads NPM total downloads Linux Build Status

Easy way to split a string on a given character unless it's quoted or escaped.

Please consider following this project's author, Jon Schlinkert, and consider starring the project to show your ❤️ and support.

Install

Install with npm:

$ npm install --save split-string

Usage

const split = require('split-string');

console.log(split('a.b.c'));
//=> ['a', 'b', 'c']

// respects escaped characters
console.log(split('a.b.c\\.d'));
//=> ['a', 'b', 'c.d']

// respects double-quoted strings
console.log(split('a."b.c.d".e'));
//=> ['a', '"b.c.d"', 'e']

Options

options.quotes

Type: Array|Boolean

Default: []

Description

Tell split-string not to split inside any of the quote characters specified on the quotes option. Each character signifies both the "opening" and "closing" character to use.

// default behavior
console.log(split('a.b."c.d.e.f.g".h.i'));
//=> [ 'a', 'b', '"c', 'd', 'e', 'f', 'g"', 'h', 'i' ]

// with quotes
console.log(split('a.b."c.d.e.f.g".h.i', { quotes: ['"'] }));
//=> [ 'a', 'b', '"c.d.e.f.g"', 'h', 'i' ]

// escaped quotes will be ignored
console.log(split('a.b.\\"c.d."e.f.g".h.i', { quotes: ['"'] }));
//=> [ 'a', 'b', '"c', 'd', '"e.f.g"', 'h', 'i' ]

// example of how to exclude non-escaped quotes from the result
let keep = (value, state) => {
  return value !== '\\' && (value !== '"' || state.prev() === '\\');
};
console.log(split('a.b.\\"c.d."e.f.g".h.i', { quotes: ['"'], keep }));
//=> [ 'a', 'b', '"c', 'd', 'e.f.g', 'h', 'i' ]

Options

options.brackets

Type: Object|Boolean

Default: {}

Description

By default, no special significance is given to bracket-like characters (such as square brackets, curly braces, angle brackets, and so on).

// default behavior
console.log(split('a.{b.c}.{d.e}'));
//=> [ 'a', '{b', 'c}', '{d', 'e}' ]

When options.brackets is true, the following brackets types are supported:

{
  '<': '>',
  '(': ')',
  '[': ']',
  '{': '}'
}

For example:

console.log(split('a.{b.c}.{d.e}', { brackets: true }));
//=> [ 'a', '{b.c}', '{d.e}' ]

Alternatively, an object of brackets may be passed, where each key is the opening bracket and each value is the corresponding closing bracket. Note that the key and value must be different characters. If you want to use the same character for both open and close, use the quotes option.

Examples

// no bracket support by default
console.log(split('a.{b.c}.[d.e].f'));
//=> [ 'a', '{b', 'c}', '[d', 'e]', 'f' ]

// tell split-string not to split inside curly braces
console.log(split('a.{b.c}.[d.e].f', { brackets: { '{': '}' }}));
//=> [ 'a', '{b.c}', '[d', 'e]', 'f' ]

// tell split-string not to split inside any of these types: "<>{}[]()"
console.log(split('a.{b.c}.[d.e].f', { brackets: true }));
//=> [ 'a', '{b.c}', '[d.e]', 'f' ]

// ...nested brackets are also supported
console.log(split('a.{b.{c.d}.e}.f', { brackets: true }));
//=> [ 'a', '{b.{c.d}.e}', 'f' ]

// tell split-string not to split inside the given custom types
console.log(split('«a.b».⟨c.d⟩.[e.f]', { brackets: { '«': '»', '⟨': '⟩' } }));
//=> [ '«a.b»', '⟨c.d⟩', '[e', 'f]' ]

options.keep

Type: function

Default: Function that returns true if the character is not \\.

Function that returns true when a character should be retained in the result.

Example

console.log(split('a.b\\.c')); //=> ['a', 'b.c']

// keep all characters
console.log(split('a.b.\\c', { keep: () => true })); //=> ['a', 'b\.c']

options.separator

Type: string

Default: .

The character to split on.

Example

console.log(split('a.b,c', { separator: ',' })); //=> ['a.b', 'c']

Split function

Optionally pass a function as the last argument to tell split-string whether or not to split when the specified separator is encountered.

Example

// only split on "." when the "previous" character is "a"
console.log(split('a.b.c.a.d.e', state => state.prev() === 'a'));
//=> [ 'a', 'b.c.a', 'd.e' ]

The state object exposes the following properties:

  • input - (String) The un-modified, user-defined input string
  • separator - (String) the specified separator to split on.
  • index - (Number) The current cursor position
  • value - (String) The character at the current index
  • bos - (Function) Returns true if position is at the beginning-of-string
  • eos - (Function) Returns true if position is at the end-of-string
  • prev - (Function) Returns the previously scanned character
  • next - (Function) Returns the next character after the current position
  • block - (Object) The "current" AST node.
  • stack - (Array) AST nodes

About

Contributing

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue.

Running Tests

Running and reviewing unit tests is a great way to get familiarized with a library and its API. You can install dependencies and run tests with the following command:

$ npm install && npm test
Building docs

(This project's readme.md is generated by verb, please don't edit the readme directly. Any changes to the readme must be made in the .verb.md readme template.)

To generate the readme, run the following command:

$ npm install -g verbose/verb#dev verb-generate-readme && verb

Related projects

You might also be interested in these projects:

  • deromanize: Convert roman numerals to arabic numbers (useful for books, outlines, documentation, slide decks, etc) | homepage
  • randomatic: Generate randomized strings of a specified length using simple character sequences. The original generate-password. | homepage
  • repeat-string: Repeat the given string n times. Fastest implementation for repeating a string. | homepage
  • romanize: Convert numbers to roman numerals (useful for books, outlines, documentation, slide decks, etc) | homepage

Contributors

Commits Contributor
56 jonschlinkert
12 doowb
6 Ovyerus
1 silverwind

Author

Jon Schlinkert

License

Copyright © 2019, Jon Schlinkert. Released under the MIT License.


This file was generated by verb-generate-readme, v0.8.0, on April 22, 2019.

split-string's People

Contributors

doowb avatar jonschlinkert avatar ovyerus avatar silverwind avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

split-string's Issues

I, uh,

... have absolutely no idea how I forked your repository, was there an unmarked "Clone/Fork" button on your Twitter somewhere that I must've button-mashed by mistake? :|

But, oh yeah, I got two helper functions for doing these already. :D The one for parsing shell-like syntax was named smartSplit, and another which will only make sense to you if you've ever written a man-page...

// Implementation
function parseArgString(input){
	return input
		.replace(/^\s+|\s+$/g, "")
		.replace(/\\\\|\\\r?\n/g, "")
		.replace(/\\".*$/gm, "")
		.match(/""|"(?:""|\\ |[ \t]*[^\s"]+[ \t]*)+(?:"|$)|(?:\\ |[^"\s])(?:\\ |\S)*/g)
		.map(arg => ('"' === arg[0]
			? arg.replace(/^"|"$/g, "").replace(/""/g, '"')
			: arg).replace(/\\ /g, " "));
}
// Test
require("fs").readFileSync("test.roff", "utf8")
	.match(/^\.BI\s+(\S.*)$/gm)
	.map(test => test.replace(/^\.BI\s+/, ""))
	.map(test => {
		let i = 0;
		for(const arg of parseArgString(test))
			process.stdout.write(`\x1B[${(i++ % 2) ? 4 : 1}m${arg}\x1B[0m`);
		process.stdout.write("\n");		
	});

Fixture:

.de BI
\fB\\$1\fI\\$2\fB\\$3\fI\\$4\fB\\$5\fI\\$6\fB\\$7\fI\\$8\fB\\$9\fR
..
.nf
.BI B
.BI B I
.BI B I B
.BI "B"
.BI "B" "I"
.BI "B" "I" "B"
.BI "B
.BI "B" "I
.BI "B" "I" "B
.BI BBB
.BI BBB III
.BI "BBB BBB"
.BI "BBB BBB BBB" III
.BI "BBB"" BBB \" CCC" CCC
.BI BBB""BBB III BBB" III
.BI BBB"""BBB III
.BI "BBB"""III BBB
.BI BBB\ BBB
.BI "BBB\ BBB"
.BI "BBB\\ BBB"
.BI "BBB\\\ BBB"

And hey, it actually works! 😀

B
BI
BIB
B
BI
BIB
B
BI
BIB
BBB
BBBIII
BBB BBB
BBB BBB BBBIII
BBB" BBB
BBB""BBBIIIBBB"III
BBB"""BBBIII
BBB"IIIBBB
BBB BBB
BBB BBB
BBB BBB
BBB BBB

`split` has an error in IE11

After using spilt in my angular project, the corresponding lazy loaded module route cannot be accessed.The same problem with union-values

separator as an array or a function

I want this:

split('lorem ipsum@dolor sit@amet', { separator: [' ', '@']});
// result: ['lorem', 'ipsum', 'dolor', 'sit', 'amet']

split('lorem ipsum@dolor sit@amet', { separator: state => [' ', '@'].includes(state.value()) });
// result: ['lorem', 'ipsum', 'dolor', 'sit', 'amet']

change how quotes are handled?

Currently, quotes are handled using an array because usually, the start and end quotes are the same characters. I'd like to propose changing the handling to an object hash similar to how brackets are handled. This will allow handling crazy quotes like these: “ ”.

If this seems like a good way to go, I'll do an update. To say backwards compatible, we can still allow an array to be passed in and convert it into an object.

-

Bundled split-string occupies 84kb. That includes buffer and other heavyweight dependencies. It would be really nice to reduce package size.

Documentation is outdated

The docs say:

// Brackets

Also respects brackets unless disabled:

split('a (b c d) e', ' ');
//=> ['a', '(b c d)', 'e']

and

split('a.b,c', {separator: ','});
//=> ['a.b', 'c']
 
// you can also pass the separator as a string as the last argument
split('a.b,c', ',');
//=> ['a.b', 'c']

Both of which don't seem to hold anymore. As brackets seem to be ignored, and the last argument does not change the separator.

Escaped backslash before quote

It appears the logic for when to respect a quote vs an escaped quote does not consider if the preceding escape char is itself escaped. For example:

$ node
Welcome to Node.js v12.3.1.
Type ".help" for more information.
> const split = require('split-string')
undefined
> console.log(split('\\\\"hello world\\\\"', { separator: ' ', quotes: ['"'] }))
[ '\\\\"hello', 'world\\\\"' ]
undefined
>

I would expect this to be [ '\\\\"hello world\\\\"' ].

Thoughts?

RegExp separator

I think it would be useful if split-string would, like String.prototype.split, support splitting using a RegExp as the separator. One example where this is useful is with multiple separators in series which yield empty strings:

require('split-string')('foo  bar', {separator: ' '})
// => [ 'foo', '', 'bar' ]

'foo  bar'.split(' ')
// => [ 'foo', '', 'bar' ]

But with String.prototype.split's RegExp separator, one can split on one or more separator:

> 'foo  bar'.split(/ +/)
// => [ 'foo', 'bar' ]

Index passed to state can sometimes be incorrect

Using the state param within the keep() function has the incorrect index when encountering a quote. Looking at the code, state.index = i, but then later i is changed without updating state

 $ node 
Welcome to Node.js v12.3.1.
Type ".help" for more information.
> const split = require('split-string');
undefined
> split('a\\"', {
...   separator: ' ',
...   quotes: ['"'],
...   keep: (value, state) => {
.....     console.log({
.......       value,
.......       prev: state.prev(),
.......       index: state.index,
.......       charAtIndex: state.input[state.index],
.......       input: state.input
.......     });
.....     if (state.input[state.index] !== value) {
.......       throw new Error()
.......     }
.....     return true;
.....   }
... });
{
  value: 'a',
  prev: undefined,
  index: 0,
  charAtIndex: 'a',
  input: 'a\\"'
}
{ value: '\\', prev: 'a', index: 1, charAtIndex: '\\', input: 'a\\"' }
{ value: '"', prev: '\\', index: 1, charAtIndex: '\\', input: 'a\\"' }
Thrown:
Error
    at keep (repl:13:13)
    at append (./node_modules/split-string/index.js:42:18)
    at module.exports (./node_modules/split-string/index.js:68:9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.