Assessment methods and tests should have validity and reliability data and research to back up their claims that the test is a sound measure.
Reliability is a very important concept and works in tandem with Validity. A guiding principle for psychology is that a test can be reliable but not valid for a particular purpose, however, a test cannot be valid if it is unreliable.
Assessment methods including personality
questionnaires, ability assessments, interviews, or any other assessment method are valid to the extent that the assessment method measures what it was designed to measure.
There are different aspects of validity and they differ in their focus.
The aspects of validity that have an impact on the actual scientific application of the assessment are concurrent validity, predictive validity and content validity.
The two less relevant aspects of validity are face and content validity.
Construct validity is the theoretical focus of validity and is the extent to which performance on the test fits into the theoretical scheme and research already established on the attribute or construct the test is trying to measure. 
Concurrent validity is the relationship between test scores and some criterion measure of job performance or training performance at the same time.
Predictive validity (Criterion Related Validity) is the extent to which a test or questionnaire predicts some future or desired outcome, for example work behaviour or on-the-job performance. This validity has obvious importance in personnel selection, recruitment and development. 

Face validity of a test or method concerns the look and feel of the assessment items and whether an applicant can see any relevance of the test or assessment method to the job or role concerned.
test being a good measure or sound test.
Content validity of a test is concerned with how well a test samples the behavioural domain it is trying to measure.
The following table summarises some of the general research findings around the predictive validity of the different selection methods available:
Assessment Method |
Predictive Validity |
Assessment Centres (multiple methods) |
0.65 |
Behavioural Interviews |
0.4 – 0.6 |
Work-sample Tests |
0.54 |
Ability Tests |
0.53 |
Modern Personality Tests |
0.39 |
Biographical data |
0.38 |
References |
0.23 |
Traditional Interviews |
0.05 – 0.19 |
Source: British Psychological Society/Accord Group
Assessment method |
Predictive validity |
Criterion measure |
Integrity Tests |
0.58 |
|
Integrity Tests |
0.51 |
Overall job performance |
Source – Comprehensive meta-analysis of integrity test validities by Ones, Viswesvaran & Schmit (1993).
The following table illustrates how validities increase as test length increases. The calculations are based upon typical reliability and validity figures of .70 and .40 respectively for a 5 minute test. The difference in validity between a 5 minute test and a test of infinite length is only a .078 difference (.478-.400).
Test Time (Minutes) |
Validity (r) |
1 |
.270 |
2 |
.332 |
3 |
.365 |
4 |
.386 |
5 |
.400 |
6 |
.410 |
7 |
.410 |
8 |
.418 |
9 |
.424 |
10 |
.430 |
11 |
.434 |
12 |
.437 |
13 |
.440 |
14 |
.443 |
15 |
.445 |
Test of infinite length |
.478 |
When we combine assessments in a battery we can increase the validity of the testing if the tests are of approximately the same validity and have low inter-correlations.
Guilford & Fruchter (1978) summed up the different effects of lengthening tests and including more tests in a battery as follows:
An aptitude or personality assessment needs to measure each factor it is attempting to measure reliably, for the given population (e.g., customer service applicants, males, females).
Reliability is the consistency or precision with which the test or assessment method measures what it claims to measure.

Test retest reliability is when the same test is administered to a sample group of people twice.
session that may impact how well they answer the test the second time around.Alternatively, reliability is measured through a split half technique.
Test reliability is also represented by a correlation coefficient (r). As with validity coefficients, the closer the correlation coefficient is to 1 the better. While many personality tests are considered to have acceptable levels of reliability if they have reliability coefficients greater than r=0.7, ability tests should have reliability coefficients greater than r=0.8.