Blog, WordPress, WordPress Development

Data Sanitization and Validation in WordPress

Security is important in every website on the Internet. Nearly 30% of websites made by WordPress. WordPress itself is secure enough but when we use plugins, validation must be an important issue here.

WordPress has functions to sanitize and validate inputs and outputs. Just remember that never trust user data. We have to validate users’ data everywhere. One bug in one plugin can destroy everything even on the server.

Differences Between Validation and Sanitization

There is a difference between sanitization and validation in WordPress.

Validation checks if input data is in the expected format or not. For example, we expect to get an email address from the user. When the user sends the data, we check if it is an email address or not.

Sanitization or escaping applies some filters to the data to make it safe. Maybe the data contains SQL injection codes. This is a sanitization duty to detect and filter it to make it safe for the database.

We used User as who send data to our website. Sometimes it is really the user that fills the form and sometimes it is an API or a web service that sends data to our website. No matter who is the user, validate and sanitize the data.

Even admins are users, and users will enter incorrect data on purpose or on accident. It’s your job to protect them from themselves.

Rules of Sanitizing and Validating in WordPress

There might be differences between sanitization and validation in WordPress and other CMSes, but the following rules are the same generally.

1. Trust Nobody

The idea is that you should not assume that any data entered by the user is safe. Nor should you assume that the data you’ve retrieved from the database is safe.

2. Validate Input, Escape Output

Validate your data as soon as you receive it from the user. Sanitize (or escape) the data when you want to show it.

3. Use WordPress Validator and Sanitizer

WordPress has own functions to do this job. We can use PHP filters too but WordPress functions have much more filters and made them easy to use for us.

Data Validation in WordPress

First let’s talk about validation. As you know, validation is about checking data to see if they are in expected format or not.

1. PHP Built-in Functions

  • isset: Determine if a variable is declared and is different than NULL.
if (isset($var)) {
    echo "This var is set so I will print.";
}
  • empty: Determine whether a variable is empty
if (empty($var)) {
    echo '$var is either 0, empty, or not set at all';
}
  • mb_strlen and strlen: Gets the length of a string. The second argument defines the encoding format.
mb_strlen($string, '8bit');
echo strlen($str);
preg_match('/(foo)(bar)(baz)/', 'foobarbaz', $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
  • strpos: Find the position of the first occurrence of a substring in a string
$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);
  • count: Count all elements in an array, or something in an object
  • in_array: Checks if a value exists in an array
$os = array("Mac", "NT", "Irix", "Linux");
if (in_array("Irix", $os)) {
    echo "Got Irix";
}

2. WordPress Functions

  • is_email: Verifies that an email is valid.
if ( is_email( 'email@domain.com' ) ) {
    echo 'email address is valid.';
}
  • term_exists: Determines whether a taxonomy term exists.
$term = term_exists( 'Uncategorized', 'category' );
if ( $term !== 0 && $term !== null ) {
    echo __( "'Uncategorized' category exists!", "textdomain" );
}
$username = sanitize_user( $_POST['username'] );
if ( username_exists( $username ) ) {
    echo "Username In Use!";
} else {
    echo "Username Not In Use!";
}
  • validate_file: Validates a file name and path against an allowed set of rules.
$path = 'uploads/2012/12/my_image.jpg';
return validate_file( $path ); // Returns 0 (valid path)
$path = '../../wp-content/uploads/2012/12/my_image.jpg';
return validate_file( $path ); // Returns 1 (invalid path)
  • absint: Convert a value to non-negative integer.
echo absint(20.33);            // 20
echo absint(-20.33);           // 20
echo absint(false);            // 0
echo absint(true);             // 1
echo absint(array(10,20,30))   // 1
echo absint(NULL)              // 0
echo absint( 19.99 * 100 );    // Result is 1998, when the expected result is 1999
  • zeroise: Add leading zeros when necessary.
echo zeroise(70,4); // Prints 0070
  • wp_kses: Allow only some HTML tags in your data that accepts three arguments.
    • content: (string) Content to filter through kses
    • allowed_html: An array where each key is an allowed HTML element and the value is an array of allowed attributes for that element
    • allowed_protocols: Optional. Allowed protocol in links (for example http, mailto, feed, etc.)
$content = "<em>Click</em> <a title='click' href='http://honarsystems.com'>here</a> to visit <strong> Honar Systems </strong>";
 
echo wp_kses( $content, array(
    'strong' => array(),
    'a' => array('href')
) );
 
// Prints the HTML "Click <a href='http://honarsystems.com'>here</a> to visit <strong> Honar Systems </strong>":
Click <a href="http://honarsystems.com">here</a> to visit <strong> Honar Systems </strong>
  • wp_kses_post: Sanitizes content for allowed HTML tags for post content.
  • wp_kses_data: Sanitize content with allowed HTML KSES rules.
$s = '<div id="1st"><strong><i>Foo</i></strong><script>alert("Bar");</script></div>';
$x = wp_kses_data($s);
// Now, $x is <strong><i>Foo</i></strong>alert("Bar");
// strips all html (empty array)
$allowed_html = wp_kses_allowed_html( 'strip' );
 
// allows all most inline elements and strips all block level elements except blockquote
$allowed_html = wp_kses_allowed_html( 'data' );
 
// very permissive: allows pretty much all HTML to pass - same as what's normally applied to the_content by default
$allowed_html = wp_kses_allowed_html( 'post' );
 
// allows a list of HTML Entities such as  
$allowed_html = wp_kses_allowed_html( 'entities' );
  • balanceTags: Balances tags if forced to, or if the ‘use_balanceTags’ option is set to true.
<?php
$html = '<ul>
  <li>this
  <li>is
  <li>a
  <li>list
</ul>';
echo balanceTags($html, true);
?>

Will output this HTML:

<ul>
  <li>this
  </li><li>is
  </li><li>a
  </li><li>list
</li></ul>
<?php $balanced_text = force_balance_tags( $text ); ?>

<div><b>This is an excerpt. <!--more--> and this is more text... </b></div>

not break, when the html after the more tag is cut off.

<div><b>This is an excerpt.

should be changed to:

<div><b>This is an excerpt. </b></div>

Data Sanitization in WordPress

Sanitization is the process of cleaning or filtering your input data. Whether the data is from a user or an API or web service, you use sanitizing when you don’t know what to expect

echo esc_html($title);
<input type="text" name="myInput" value="<?php echo esc_attr($value);?>"/>
  • esc_js: Escape single quotes, htmlspecialchar ” &, and fix line endings.
<a href="#" onclick="<?php echo esc_js( $custom_js ); ?>">Click me</a>
<a href="<?php echo esc_url( home_url( '/' ) ); ?>">Home</a>
  • urlencode_deep: Navigates through an array, object, or scalar, and encodes the values to be used in a URL.
<textarea><?php echo esc_textarea( $text ); ?></textarea>
  • sanitize_email: Strips out all characters that are not allowable in an email.
$sanitized_email = sanitize_email('     admin@example.com!     ');
echo $sanitized_email; // will output: 'admin@example.com'
// If you want to explicitly style a post, you can use the sanitized version of the post title as a class
$post_class = sanitize_html_class( $post->post_title );
echo '<div class="' . $post_class . '">';
sanitize_mime_type( $mime_type );
  • sanitize_option: Sanitises various option values based on the nature of the option.
$str = "<h2>Title</h2>";
sanitize_text_field( $str ); // it will echo "title" without any HTML tags!
  • sanitize_title: Sanitizes a string into a slug, which can be used in URLs or HTML attributes.
$new_url = sanitize_title('This Long Title is what My Post or Page might be');
  echo $new_url;

It should return a formatted value, the output would be this:
this-long-title-is-what-my-post-or-page-might-be
echo sanitize_title_with_dashes("I'm in LOVE with WordPress!!!1");
// this will print: im-in-love-with-wordpress1
  • sanitize_user: Sanitizes a username, stripping out unsafe characters.
$username= sanitize_user('     marmelada     ');
echo $username; // will output: 'marmelada'
$wp_customize->add_setting( 'accent_color', array(
  'default' => '#f72525',
  'sanitize_callback' => 'sanitize_hex_color',
) );
  • wp_rel_nofollow: Adds rel=”nofollow” string to all HTML A elements in content.

Escaping with Localization

  • _e: Display translated text.
_e( 'Some text to translate and display.', 'textdomain' );
  • __: Retrieve the translation of $text.
$translated = __( 'Hello World!', 'mytextdomain' );

// their are same
echo __( 'translate text', 'textdomain' );
_e( 'translate text', 'textdomain' );
  • _x: Retrieve translated string with gettext context.
$translated = _x( 'Read', 'past participle: books I have read', 'textdomain' );
  • esc_html__: Retrieve the translation of $text and escapes it for safe use in HTML output.
<h1><?php echo esc_html__( 'Title', 'text-domain' )?></h3>
  • esc_html_e: Display translated text that has been escaped for safe use in HTML output.
<h1><?php esc_html_e( 'Title', 'text-domain' )?></h1>
  • esc_html_x: Translate string with gettext context, and escapes it for safe use in HTML output.
  • esc_attr__: Retrieve the translation of $text and escapes it for safe use in an attribute.
  • esc_attr_e: Display translated text that has been escaped for safe use in an attribute.
<input title="<?php esc_attr_e( 'Read More', 'your_text_domain' ) ?>" type="submit" value="submit" />
// Returns <input title="Read More" type="submit" value="submit" />
  • esc_attr_x: Translate string with gettext context, and escapes it for safe use in an attribute.

The WordPress documentation states that you should not use

echo __( 'translate text', 'textdomain' );

Instead you should use following function.

_e( 'translate text', 'textdomain' );
  • antispambot: Converts email addresses characters to HTML entities to block spam bots.
$email = "johndoe@example.com";
$email = sanitize_email($email);
echo '<a href="mailto:'.antispambot($email,1).'" title="Click to e-mail me" >'.antispambot($email).' </a>';
URL “http://blog.example.com/client/?s=word”

// This would output '/client/?s=word&foo=bar&baz=tiny'
$arr_params = array( 'foo' => 'bar', 'baz' => 'tiny' );
echo esc_url( add_query_arg( $arr_params ) );
URL “http://www.example.com/client/?details=value1&type=value2&date=value3″

// This would output '/client/?type=value2&date=value3'
echo esc_url( remove_query_arg( 'details' ) );

Database Escaping

WordPress has some functions to interact database. These functions has own sanitization filters. But as you know some of them don’t have. Then it is our duty to sanitize the input.

  • esc_sql: Escapes data for use in a MySQL query.
$name   = esc_sql( $name );
$status = esc_sql( $status );
 
$wpdb->get_var( "SELECT something FROM table WHERE foo = '$name' and status = '$status'" );
  • wpdb::esc_like: First half of escaping for LIKE special characters % and _ before preparing for MySQL.
$wild = '%';
$find = 'only 43% of planets';
$like = $wild . $wpdb->esc_like( $find ) . $wild;
$sql  = $wpdb->prepare( "SELECT * FROM $wpdb->posts WHERE post_content LIKE %s", $like );

Redirect

<?php wp_redirect( home_url() ); exit; ?>

Redirects can also be external, and/or use a “Moved Permanently” code :

<?php wp_redirect( 'http://www.example.com', 301 ); exit; ?>
if ( wp_safe_redirect( $url ) ) {
    exit;
}

Final Words About Sanitization and Validation in WordPress

Sanitization and validation in WordPress is security matter that every developers should be familiar with it. As you know if you want develop your custom plugin or theme, it is important that your code is safe enough or not.

One bug will destroy everything!